Apache Spark 2: Data Processing and Real-Time Analytics

More Information
Learn
  • Get to grips with all the features of Apache Spark 2.x
  • Perform highly optimized real-time big data processing 
  • Use ML and DL techniques with Spark MLlib and third-party tools
  • Analyze structured and unstructured data using SparkSQL and GraphX
  • Understand tuning, debugging, and monitoring of big data applications 
  • Build scalable and fault-tolerant streaming applications 
  • Develop scalable recommendation engines
About

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.

You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.

By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.

This Learning Path includes content from the following Packt products:

  • Mastering Apache Spark 2.x by Romeo Kienzler
  • Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla
  • Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook
Features
  • Master the art of real-time big data processing and machine learning 
  • Explore a wide range of use-cases to analyze large data 
  • Discover ways to optimize your work by using many features of Spark 2.x and Scala
Page Count 616
Course Length 18 hours 28 minutes
ISBN 9781789959208
Date Of Publication 21 Dec 2018

Authors

Siamak Amirghodsi

Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, TensorFlow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, IoT, blockchain, probabilistic graphical models, cryptography, and NLP.

Romeo Kienzler

Romeo Kienzler works as the chief data scientist in the IBM Watson IoT worldwide team, helping clients to apply advanced machine learning at scale on their IoT sensor data. He holds a Master's degree in computer science from the Swiss Federal Institute of Technology, Zurich, with a specialization in information systems, bioinformatics, and applied statistics.

Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.

Sridhar Alla

Sridhar Alla is a big data expert helping companies solve complex problems in distributed computing, large scale data science and analytics practice. He holds a bachelor's in computer science from JNTU, India. He loves writing code in Python, Scala, and Java. He also has extensive hands-on knowledge of several Hadoop-based technologies, TensorFlow, NoSQL, IoT, and deep learning.

Meenakshi Rajendran

Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master's degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale.

Broderick Hall

Broderick Hall is a hands-on big data analytics expert and holds a master’s degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation.

Shuen Mei

Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.