Packt is pleased to announce the release of its new book Fast Data Processing with Spark, a step-by-step guide that teaches readers about the different ways to interact with Spark's distributed representation of data (RDDs). The book shows readers how to effectively test distributed software, tune a Spark installation, and install and set up Spark on their data cluster. The book is 120 pages long and is competitively priced at $37.99, while the eBook is available in all the popular formats including Kindle and PDF for $19.54.
About the Author:
Holden Karau is a software developer from Canada currently living in San Francisco. Holden graduated from the University of Waterloo in 2009 with a Bachelors of Mathematics in Computer Science. She currently works as a Software Development Engineer at Google, and has worked on search and classification problems at Amazon. Open source development has been a passion of Holden's and a number of her projects have been covered on Slashdot. To learn more about her, please visit her website http://www.holdenkarau.com, or blog http://blog.holdenkarau.com.
Spark is an open source data analytics cluster computing framework originally developed in AMPLab at UC Berkeley. Spark fits into the Hadoop open source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into clusters.
The book covers how to use the interactive shell to quickly prototype distributed programs and explore the Spark API. It also examines how to use Hive with Spark to use SQL-like query syntax with Shark. Fast Data Processing with Spark covers everything from setting up the Spark cluster in a variety of situations (stand-alone, EC2, and so on), to how to use the interactive shell to write distributed code interactively.
Fast Data Processing with Spark covers the following topics:
Chapter 1: Installing Spark and Setting Up Your Cluster
Chapter 2: Using the Spark Shell
Chapter 3: Building and Running a Spark Application
Chapter 4: Creating a Spark Context
Chapter 5: Loading and Saving Data in Spark
Chapter 6: Manipulating Your RDD
Chapter 7: Shark – Using Spark with Hive
Chapter 8: Testing
Chapter 9: Tips and Tricks
Fast Data Processing with Spark is ideal for software developers who want to learn how to write distributed programs with Spark. No previous experience with distributed programming is necessary. This book assumes that readers have a knowledge of Java, Scala, or Python. To know more about the book please visit: http://www.packtpub.com/fast-data-processing-with-spark/book
|Fast Data Processing with Spark|
|Fast Data Processing with Spark covers how to write distributed map reduce style programs with Spark.
For more information, please visit: http://www.packtpub.com/fast-data-processing-with-spark/book