Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Raspberry Pi Super Cluster
Raspberry Pi Super Cluster

Raspberry Pi Super Cluster: As a Raspberry Pi enthusiast have you ever considered increasing their performance with parallel computing? Discover just how easy it can be with the right help – this guide takes you through the process from start to finish.

By Andrew K. Dennis
Can$39.99 Can$27.98
Book Nov 2013 126 pages 1st Edition
eBook
Can$39.99 Can$27.98
Print
Can$49.99
Subscription
Free Trial
eBook
Can$39.99 Can$27.98
Print
Can$49.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Nov 20, 2013
Length 126 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783286195
Category :
Table of content icon View table of contents Preview book icon Preview Book

Raspberry Pi Super Cluster

Chapter 1. Clusters, Parallel Computing, and Raspberry Pi – A Brief Background

The domain of parallel computing is an interesting one, but building a cluster for fun has often required the use of expensive or bulky off-the-shelf hardware, such as desktop PC's or implementing complex virtual machine setups.

So what is a cluster? This term will come up often in the following chapters and essentially means, in the context of this book, a group of separate devices networked together. Each device on this network is often referred to as a node.

Thanks to the Raspberry Pi's low cost and small physical footprint, building a cluster to explore parallel computing has become far cheaper and easier for users at home to implement. Not only does it allow you to explore the software side, but also the hardware as well.

While Raspberry Pis wouldn't be suitable for a fully-fledged production system, they provide a great tool for learning the technologies that professional clusters are built upon. For example, they allow you to work with industry standards, such as MPI and cutting edge open source projects such as Hadoop.

This chapter will provide you with a basic background to parallel computing and the technologies associated with it. It will also provide you with an introduction to using the Raspberry Pi.

A very short history of parallel computing


The basic assumption behind parallel computing is that a larger problem can be divided into smaller chunks, which can then be operated on separately at the same time.

Related to parallelism is the concept of concurrency, but the two terms should not be confused.

Parallelism can be thought of as simultaneous execution and concurrency as the composition of independent processes. You will encounter both of these approaches in this book.

You can find out more about the differences between the two at the following site:

http://blog.golang.org/concurrency-is-not-parallelism

Parallel computing and related concepts have been in use by capital-intensive industries, such as Aircraft design and Defense, since the late 1950's and early 1960's. With the cost of hardware having dropped rapidly over the past five decades and the birth of open source operating systems and applications; home enthusiasts, students, and small companies now have the ability to leverage these technologies for their own uses.

Traditionally parallel computing was found within High Performance Computing (HPC) architectures, those being systems categorized by high speed and density of calculations. The term you will probably be most familiar with in this context is, of course, supercomputers, which we shall look at next.

Supercomputers

The genesis of supercomputing can be found in the 1960's with a company called Control Data Corporation (CDC). Seymour Cray was an electrical engineer working for CDC who became known as the father of supercomputing due to his work on the CDC 6600, generally considered to be the first supercomputer. The CDC 6600 was the fastest computer in operation between 1964 and 1969.

In 1972 Cray left CDC and formed his own company, Cray Research. In 1975 Cray Research announced the Cray-1 supercomputer. The Cray-1 would go on to be one of the most successful supercomputers in history and was still in use among some institutions until the late 1980's.

The 1980's also saw a number of other players enter the market including Intel via the Caltech Concurrent Computation project, which contained 64 Intel 8086/8087 CPU's and Thinking Machines Corporation's CM-1 Connection Machine.

This preceded an explosion in the 1990's with regards to the number of processors being included in supercomputing machines. It was in this decade, thanks to brute-force computing power that IBM infamously beat world chess master Garry Kasparov with the Deep Blue supercomputer.

The Deep Blue machine contained some 30 nodes each including IBM RS6000/SP parallel processors and numerous "chess chips".

By the 2000's the number of processors had blossomed to tens of thousands working in parallel. As of June 2013 the fastest supercomputer title was held by the Tianhe-2, which contains 3,120,000 cores and is capable of running at 33.86 petaflops per second.

Parallel computing is not just limited to the realm of supercomputing. Today we see these concepts present in multi-core and multiprocessor desktop machines. As well as single devices we also have clusters of independent devices, often containing a single core, that can be connected up to work together over a network.

Since multi-core machines can be found in consumer electronic shops all across the world we will look at these next.

Multi-core and multiprocessor machines

Machines packing multiple cores and processors are no longer just the domain of supercomputing. There is a good chance that your laptop or mobile phone contains more than one processing core, so how did we reach this point?

The mainstream adoption of parallel computing can be seen as a result of the cost of components dropping due to Moore's law. The essence of Moore's law is that the number of transistors in integrated circuits doubles roughly every 18 to 24 months.

This in turn has consistently pushed down the cost of hardware such as CPU's. As a result, manufacturers such as Dell and Apple have produced even faster machines for the home market that easily outperform the supercomputers of old that once took a room to house.

Computers such as the 2013 Mac Pro can contain up to twelve cores, that is a CPU that duplicates some of its key computational components twelve times. These cost a fraction of the price that the Cray-1 did at its launch.

Devices that contain multiple cores allow us to explore parallel-based programming on a single machine. One method that allows us to leverage multiple cores is threads.

Threads can be thought of as a sequence of instructions usually contained within a single lightweight process that the operating system can then schedule to run. From a programming perspective this could be a separate function that runs independently from the main core of the program.

Thanks to the ability to use threads in application development, by the 1990's a set of standards had come to dominate the area of shared memory multiprocessor devices, these were POSIX Threads (Pthreads) and OpenMP.

POSIX threads is a standardized C language interface specified in the IEEE POSIX 1003.1c standard for programming threads, that can be used to implement parallelism.

The other standard specified is OpenMP. To quote the OpenMP website, it can be described as:

OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs.

http://openmp.org/

What this means in practice is that OpenMP is a standard that provides an API that helps to deal with problems, such as multi-threading and memory sharing. By including OpenMP in your project, you can write multithreaded applications without having to take care of many of the low-level implementation details as with writing an application purely using Pthreads.

Commodity hardware clusters

As with single devices with many CPU's, we also have groups of commodity off the shelf (COTS) computers, which can be networked together into a Local Area Network (LAN). These used to be commonly referred to as Beowulf clusters.

In the late 1990's, thanks to the drop in the cost of computer hardware, the implementation of Beowulf clusters became a popular topic, with Wired magazine publishing a how-to guide in 2000:

http://www.wired.com/wired/archive/8.12/beowulf.html

The Beowulf cluster has its origin in NASA in the early 1990's, with Beowulf being the name given to the concept of a Network Of Workstations (NOW) for scientific computing devised by Donald J. Becker and Thomas Sterling.

The implementation of commodity hardware clusters running technologies such as MPI lies behind the Raspberry Pi-based projects we will be building in this book.

Cloud computing

The next topic we will look at is cloud computing. You have probably heard the term before, as it is something of a buzzword at the moment.

At the core of the term is a set of technologies that are distributed, scalable, metered (as with utilities), can be run in parallel, and often contain virtual hardware. Virtual hardware is software that mimics the role of a real hardware device and can be programmed as if it were in fact a physical machine.

Examples of virtual machine software include VirtualBox, Red Hat Enterprise Virtualization, and parallel virtual machine (PVM). You can learn more about PVM here:

http://www.csm.ornl.gov/pvm/

Over the past decade, many large Internet-based companies have invested in cloud technologies, the most famous perhaps being Amazon. Having realized they were under utilizing a large proportion of their data centers, Amazon implemented a cloud computing-based architecture which eventually resulted in a platform open to the public known as Amazon Web Services (AWS).

Products such as Amazon's AWS Elastic Compute Cloud (EC2) have opened up cloud computing to small businesses and home consumers by allowing them to rent virtual computers to run their own applications and services. This is especially useful for those interested in building their own virtual computing clusters.

Due to the elasticity of cloud computing services such as EC2, it is easy to spool up many server instances and link these together to experiment with technologies such as Hadoop.

One area where cloud computing has become of particular use, especially when implementing Hadoop, is in the processing of big data.

Big data

The term big data has come to refer to data sets spanning terabytes or more. Often found in fields ranging from genomics to astrophysics, big data sets are difficult to work with and require huge amount of memory and computational power to query.

These data sets obviously need to be mined for information. Using parallel technologies such as MapReduce, as realized in Apache Hadoop, have provided a tool for dividing a large task such as this amongst multiple machines. Once divided, tasks are run to locate and compile the needed data.

Another Apache application is Hive, a data warehouse system for Hadoop that allows the use of a SQL-like language called HiveQL to query the stored data.

As more data is produced year-on-year by more computational devices ranging from sensors to cameras, the ability to handle large datasets and process them in parallel to speed up queries for data will become ever more important.

These big data problems have in-turn helped push the boundaries of parallel computing further as many companies have come into being with the purpose of helping to extract information from the sea of data that now exists.

Raspberry Pi and parallel computing


Having reviewed some of the key terms of High Performance Computing, it is now time to turn our attention to the Raspberry Pi and how and why we intend to implement many of the ideas explained so far.

This book assumes that you are familiar with the basics of the Raspberry Pi and how it works, and have a basic understanding of programming. Throughout this book when using the term Raspberry Pi, it will be in reference to the Model B version.

For those of you new to the device, we recommend reading a little more about it at the official Raspberry Pi home page:

http://www.raspberrypi.org/

Other topics covered in this book, such as Apache Hadoop, will also be accompanied with links to information that provides a more in-depth guide to the topic at hand.

Due to the Raspberry Pi's small size and low cost, it makes a good alternative to building a cluster in the cloud on Amazon, or similar providers which can be expensive or using desktop PC's.

The Raspberry Pi comes with a built-in Ethernet port, which allows you to connect it to a switch, router, or similar device. Multiple Raspberry Pi devices connected to a switch can then be formed into a cluster; this model will form the basis of our hardware configuration in the book.

Unlike your laptop or PC, which may contain more than one CPU, the Raspberry Pi contains just a single ARM processor; however, multiple Raspberry Pi's combined give us more CPU's to work with.

One benefit of the Raspberry Pi is that it also uses SD cards as secondary storage, which can easily be copied, allowing you to create an image of the Raspberry Pi's operating system and then clone it for re-use on multiple machines. When starting out with the Raspberry Pi this is a useful feature and something that will be covered in Chapter 2, Setting Up your Raspberry Pi Software and Hardware for Parallel Computing.

The Model B contains two USB ports allowing us to expand the device's storage capacity (and the speed of accessing the data) by using a USB hard drive instead of the SD card.

From the perspective of writing software, the Raspberry Pi can run various versions of the Linux operating system as well as other operating systems, such as FreeBSD and the software and tools associated with development on it. This allows us to implement the types of technology found in Beowulf clusters and other parallel systems. We shall provide an overview of these development tools next.

Programming languages and frameworks

A number of programming languages including Fortran, C/C++, and Java are available on the Raspberry Pi, including via the standard repositories. These can be used for writing parallel applications using implementations of MPI, Hadoop, and the other frameworks we discussed earlier in this chapter.

Fortran, C, and C++ have a long history with parallel computing and will all be examined to varying degrees throughout the book. We will also be installing Java in order to write Hadoop-based MapReduce applications.

Fortran, due to its early implementation on supercomputing projects is still popular today for parallel computing application development, as a large body of code that performs specific scientific calculations exists. In Chapter 2, Setting Up your Raspberry Pi Software and Hardware for Parallel Computing, we will provide brief instructions on installing it onto your Raspberry Pi and provide a further project in Chapter 7, Going Further.

In Chapter 3, Parallel Computing - MPI on the Raspberry Pi, we will install MPICH and run an example C application that comes bundled with the library, which will give you the opportunity of using the Message Passing Interface (MPI).

MPI is a language-independent message-passing communication method developed in the early 1990's to aid parallel computing application development. The topic of MPI will be covered in greater depth in Chapter 3, Parallel Computing - MPI on the Raspberry Pi, where we will test an application that calculates π using two Raspberry Pi devices.

In Chapter 4, Hadoop – Distributed Applications on the Raspberry Pi, we examine the Java programming language and Apache Hadoop in further detail. These form the final two important technologies we will cover in this book.

Apache Hadoop is an open source Java-based MapReduce framework designed for distributed parallel application development.

A MapReduce framework allows an application to take, for example, a number of data sets, divide them up, and mine each data set independently. This can take place on separate devices and then the results are combined into a single data set from which we finally extract a meaningful value.

In Chapter 5, MapReduce Applications with Hadoop and Java, we explain MapReduce in detail. The MapReduce model lends itself to being deployed on COTS clusters and cloud services such as EC2. In this book we will demonstrate how to set up Hadoop on two Raspberry Pis in order to mine for data and calculate π using a Monte Carlo Simulator.

Finally the Appendix of this book contains a number of links and resources that the reader may find of interest for Fortran, Java, C, and C++.

Summary


This concludes our short introduction to parallel computing and the tools we will be using on Raspberry Pi.

You should now have a basic idea of some of the terms related to parallel computing and why using the Raspberry Pi is a fun and cheap way to build your own computing cluster.

Our next task will be to set up our first Raspberry Pi, including installing its operating system. Once set up is complete, we can then clone its SD card and re-use it for future machines.

So grab your hardware as the next chapter will guide you through this process.

Left arrow icon Right arrow icon

Key benefits

  • Learn about parallel computing by building your own system using Raspberry Pi
  • Build a two-node parallel computing cluster
  • Integrate Raspberry Pi with Hadoop to build your own super cluster

Description

A cluster is a type of parallel/distributed processing system which consists of a collection of interconnected stand-alone computers cooperatively working together. Using Raspberry Pi computers, you can build a two-node parallel computing cluster which enhances performance and availability. This practical, example-oriented guide will teach you how to set up the hardware and operating systems of multiple Raspberry Pi computers to create your own cluster. It will then navigate you through how to install the necessary software to write your own programs such as Hadoop and MPICH before moving on to cover topics such as MapReduce. Throughout this book, you will explore the technology with the help of practical examples and tutorials to help you learn quickly and efficiently. Starting from a pile of hardware, with this book, you will be guided through exciting tutorials that will help you turn your hardware into your own super-computing cluster. You'll start out by learning how to set up your Raspberry Pi cluster's hardware. Following this, you will be taken through how to install the operating system, and you will also be given a taste of what parallel computing is about. With your Raspberry Pi cluster successfully set up, you will then install software such as MPI and Hadoop. Having reviewed some examples and written some programs that explore these two technologies, you will then wrap up with some fun ancillary projects. Finally, you will be provided with useful links to help take your projects to the next step.

What you will learn

Discover how to set up the hardware to build your parallel computing cluster Set up your Raspberry Pi computers and install an operating system Network your two Raspberry Pi s together Gain an understanding of MPI through practical examples Learn how to work with MPICH to write parallel applications Install Hadoop and experiment with processing text files Get acquainted with MapReduce, the paradigm at the heart of Hadoop

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Nov 20, 2013
Length 126 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783286195
Category :

Table of Contents

15 Chapters
Raspberry Pi Super Cluster Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Clusters, Parallel Computing, and Raspberry Pi – A Brief Background Chevron down icon Chevron up icon
Setting Up your Raspberry Pi Software and Hardware for Parallel Computing Chevron down icon Chevron up icon
Parallel Computing – MPI on the Raspberry Pi Chevron down icon Chevron up icon
Hadoop – Distributed Applications on the Raspberry Pi Chevron down icon Chevron up icon
MapReduce Applications with Hadoop and Java Chevron down icon Chevron up icon
Calculate Pi with Hadoop and MPI Chevron down icon Chevron up icon
Going Further Chevron down icon Chevron up icon
Appendix Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.