Make high-speed distributed computing easy with Packt’s new book and eBook

November 2013 | Open Source

Packt is pleased to announce the release of its new book Fast Data Processing with Spark, a step-by-step guide that teaches readers about the different ways to interact with Spark's distributed representation of data (RDDs). The book shows readers how to effectively test distributed software, tune a Spark installation, and install and set up Spark on their data cluster. The book is 120 pages long and is competitively priced at $37.99, while the eBook is available in all the popular formats including Kindle and PDF for $19.54.

About the Author:

Holden Karau is a software developer from Canada currently living in San Francisco. Holden graduated from the University of Waterloo in 2009 with a Bachelors of Mathematics in Computer Science. She currently works as a Software Development Engineer at Google, and has worked on search and classification problems at Amazon. Open source development has been a passion of Holden's and a number of her projects have been covered on Slashdot. To learn more about her, please visit her website http://www.holdenkarau.com, or blog http://blog.holdenkarau.com.

Spark is an open source data analytics cluster computing framework originally developed in AMPLab at UC Berkeley. Spark fits into the Hadoop open source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into clusters.

The book covers how to use the interactive shell to quickly prototype distributed programs and explore the Spark API. It also examines how to use Hive with Spark to use SQL-like query syntax with Shark. Fast Data Processing with Spark covers everything from setting up the Spark cluster in a variety of situations (stand-alone, EC2, and so on), to how to use the interactive shell to write distributed code interactively.

Fast Data Processing with Spark covers the following topics:

Chapter 1: Installing Spark and Setting Up Your Cluster

Chapter 2: Using the Spark Shell

Chapter 3: Building and Running a Spark Application

Chapter 4: Creating a Spark Context

Chapter 5: Loading and Saving Data in Spark

Chapter 6: Manipulating Your RDD

Chapter 7: Shark – Using Spark with Hive

Chapter 8: Testing

Chapter 9: Tips and Tricks

Fast Data Processing with Spark is ideal for software developers who want to learn how to write distributed programs with Spark. No previous experience with distributed programming is necessary. This book assumes that readers have a knowledge of Java, Scala, or Python. To know more about the book please visit: http://www.packtpub.com/fast-data-processing-with-spark/book


Fast Data Processing with Spark
Fast Data Processing with Spark covers how to write distributed map reduce style programs with Spark.

For more information, please visit: http://www.packtpub.com/fast-data-processing-with-spark/book

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software