Reader small image

You're reading from  Apache Spark Quick Start Guide

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789349108
Edition1st Edition
Languages
Right arrow
Authors (2):
Shrey Mehrotra
Shrey Mehrotra
author image
Shrey Mehrotra

Shrey Mehrotra has over 8 years of IT experience and, for the past 6 years, has been designing the architecture of cloud and big-data solutions for the finance, media, and governance sectors. Having worked on research and development with big-data labs and been part of Risk Technologies, he has gained insights into Hadoop, with a focus on Spark, HBase, and Hive. His technical strengths also include Elasticsearch, Kafka, Java, YARN, Sqoop, and Flume. He likes spending time performing research and development on different big-data technologies. He is the coauthor of the books Learning YARN and Hive Cookbook, a certified Hadoop developer, and he has also written various technical papers.
Read more about Shrey Mehrotra

Akash Grade
Akash Grade
author image
Akash Grade

Akash Grade is a data engineer living in New Delhi, India. Akash graduated with a BSc in computer science from the University of Delhi in 2011, and later earned an MSc in software engineering from BITS Pilani. He spends most of his time designing highly scalable data pipeline using big-data solutions such as Apache Spark, Hive, and Kafka. Akash is also a Databricks-certified Spark developer. He has been working on Apache Spark for the last five years, and enjoys writing applications in Python, Go, and SQL.
Read more about Akash Grade

View More author details
Right arrow

Spark language APIs

Spark has integration with a variety of programming languages such as Scala, Java, Python, and R. Developers can write their Spark program in either of these languages. This freedom of language is also one of the reasons why Spark is popular among developers. If you compare this to Hadoop MapReduce, in MapReduce, the developers had only one choice: Java, which made it difficult for developers from another programming languages to work on MapReduce.

Scala

Scala is the primary language for Spark. More than 70% of Spark's code is written in Scalable Language (Scala). Scala is a fairly new language. It was developed by Martin Odersky in 2001, and it was first launched publicly in 2004. Like Java, Scala also generates a bytecode that runs on JVM. Scala brings advantages from both object-oriented and functional-oriented worlds. It provides dynamic programming without compromising on type safety. As Spark is primarily written in Scala, you can find almost all of the new libraries in Scala API.

Java

Most of us are familiar with Java. Java is a powerful object-oriented programming language. The majority of big data frameworks are written in Java, which provides rich libraries to connect and process data with these frameworks.

Python

Python is a functional programming language. It was developed by Guido van Rossum and was first released in 1991. For some time, Python was not popular among developers, but later, around 2006-07, it introduced some libraries such as Numerical Python (NumPy) and Pandas, which became cornerstones and made Python popular among all types of programmers. In Spark, when the driver launches executors on worker nodes, it also starts a Python interpreter for each executor. In the case of RDD, the data is first shipped into the JVMs, and is then transferred to Python, which makes the job slow when working with RDDs.

R

R is a statistical programming language. It provides a rich library for analyzing and manipulating the data, which is why it is very popular among data analysts, statisticians, and data scientists. Spark R integration is a way to provide data scientists the flexibility required to work on big data. Like Python, SparkR also creates an R process for each executor to work on data transferred from the JVM.

SQL

Structured Query Language (SQL) is one of the most popular and powerful languages for working with tables stored in the database. SQL also enables non-programmers to work with big data. Spark provides Spark SQL, which is a distributed SQL query engine. We will learn about it in more detail in Chapter 6, Spark SQL.

Previous PageNext Page
You have been reading a chapter from
Apache Spark Quick Start Guide
Published in: Jan 2019Publisher: PacktISBN-13: 9781789349108
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Shrey Mehrotra

Shrey Mehrotra has over 8 years of IT experience and, for the past 6 years, has been designing the architecture of cloud and big-data solutions for the finance, media, and governance sectors. Having worked on research and development with big-data labs and been part of Risk Technologies, he has gained insights into Hadoop, with a focus on Spark, HBase, and Hive. His technical strengths also include Elasticsearch, Kafka, Java, YARN, Sqoop, and Flume. He likes spending time performing research and development on different big-data technologies. He is the coauthor of the books Learning YARN and Hive Cookbook, a certified Hadoop developer, and he has also written various technical papers.
Read more about Shrey Mehrotra

author image
Akash Grade

Akash Grade is a data engineer living in New Delhi, India. Akash graduated with a BSc in computer science from the University of Delhi in 2011, and later earned an MSc in software engineering from BITS Pilani. He spends most of his time designing highly scalable data pipeline using big-data solutions such as Apache Spark, Hive, and Kafka. Akash is also a Databricks-certified Spark developer. He has been working on Apache Spark for the last five years, and enjoys writing applications in Python, Go, and SQL.
Read more about Akash Grade