Reader small image

You're reading from  Fast Data Processing with Spark 2 - Third Edition

Product typeBook
Published inOct 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785889271
Edition3rd Edition
Languages
Right arrow
Author (1)
Holden Karau
Holden Karau
author image
Holden Karau

Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau

Right arrow

The data scientist and Spark features


One of the interesting questions relevant to this book is, "What do data scientists want?" It is a question that is being discussed and debated in many blogs. A short answer is as follows:

  • The ability to explore, model, and reason data at scale-because many of their algorithms get asymptotically better with data, and so, a small Dataset sample is not enough for exploring different algorithms

  • The ability to deploy without a lot of impedance

  • The facility to evolve models once they are in production and the real world is using them

In short, all we ask for is the shortest path from the lab to the factory, enabling a data scientist DevOps person! The following screenshot (combining talks from Josh Willis and Ian Buss), which displays The Sense & Sensibility of a Data Scientist DevOps, succinctly shows the value of Apache Spark to a data scientist by addressing three points:

Who is this data scientist DevOps person?

Of course, we really do not want to start...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Fast Data Processing with Spark 2 - Third Edition
Published in: Oct 2016Publisher: PacktISBN-13: 9781785889271

Author (1)

author image
Holden Karau

Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau