Reader small image

You're reading from  Analytics for the Internet of Things (IoT)

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787120730
Edition1st Edition
Languages
Right arrow
Author (1)
Andrew Minteer
Andrew Minteer
author image
Andrew Minteer

Andrew Minteer is currently the senior director, data science and research at a leading global retail company. Prior to that, he served as the director, IoT Analytics and Machine Learning at a Fortune 500 manufacturing company. He has an MBA from Indiana University with a background in statistics, software development, database design, cloud architecture, and has led analytics teams for over 10 years. He first taught himself to program on an Atari 800 computer at the age of 11 and fondly remembers the frustration of waiting through 20 minutes of beeps and static to load a 100-line program. He now thoroughly enjoys launching a 1 TB GPU-backed cloud instance in a few minutes and getting right to work. Andrew is a private pilot who looks forward to spending some time in the air sometime soon. He enjoys kayaking, camping, traveling the world, and playing around with his six-year-old son and three-year-old daughter.
Read more about Andrew Minteer

Right arrow

Apache Spark for data processing


Apache Spark is a new-ish project (at least in the world of big data, which moves at warp speed) that integrates well with Hadoop but does not necessarily require Hadoop components to operate. It is a

fast and general engine for large-scale data processing

as described on the Spark project team welcome page. The tagline of

lightning fast cluster computing

is a little catchier: we like that one better.

Apache Spark logo

What is Apache Spark?

Good question, glad you asked. Spark was built for distributed cluster computing, so everything scales nicely without any code changes. The word general in the general engine description is very appropriate for Spark. It refers to the many and varied ways you can use it.

You can use it for ETL data processing, machine learning modeling, graph processing, stream data processing, and SQL and structure data processing. It is a boon for analytics in a distributed computing world.

It has APIs for multiple programming languages such...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Analytics for the Internet of Things (IoT)
Published in: Jul 2017Publisher: PacktISBN-13: 9781787120730

Author (1)

author image
Andrew Minteer

Andrew Minteer is currently the senior director, data science and research at a leading global retail company. Prior to that, he served as the director, IoT Analytics and Machine Learning at a Fortune 500 manufacturing company. He has an MBA from Indiana University with a background in statistics, software development, database design, cloud architecture, and has led analytics teams for over 10 years. He first taught himself to program on an Atari 800 computer at the age of 11 and fondly remembers the frustration of waiting through 20 minutes of beeps and static to load a 100-line program. He now thoroughly enjoys launching a 1 TB GPU-backed cloud instance in a few minutes and getting right to work. Andrew is a private pilot who looks forward to spending some time in the air sometime soon. He enjoys kayaking, camping, traveling the world, and playing around with his six-year-old son and three-year-old daughter.
Read more about Andrew Minteer