The Spark Workshop

By Craig Covey , Dayong Du , Landon Robinson and 4 more
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

You already know you want to learn Spark, and a smarter way to learn Spark is to learn by doing. The Spark Workshop focuses on building up your practical skills so that you can perform complex Spark analytics and create your own production-ready applications. You'll learn from real examples that lead to real results. Throughout The Spark Workshop, you'll take an engaging step-by-step approach to understanding Spark. You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend learning about predictive analytics. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding. Every physical print copy of The Spark Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your printed copy. To redeem, follow the instructions located at the start of your Spark book. Fast-paced and direct, The Spark Workshop is the ideal companion for newcomers to Spark. You'll build and iterate on your code like a software developer, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.

Publication date:
September 2020

About the Authors

  • Craig Covey

    Steven Craig Covey started working with Spark and Hadoop in the oil and gas industry in 2015. In 2018, he joined Sam’s Club as a software engineer in the data science and engineering group.

    Browse publications by this author
  • Dayong Du

    Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.

    Browse publications by this author
  • Landon Robinson

    Landon Robinson has worked in big data for over 5 years. He teaches Lowe’s data scientists how to build enhanced data layers and scale their machine learning models on production data. He is now a principal software engineer and tech lead of data warehousing at SpotX.

    Browse publications by this author
  • Jason Morris

    Jason Morris is a systems and research engineer with over 19 years of experience in system architecture, research engineering, and large data analysis. His primary focus is machine learning with TensorFlow, CUDA, and Apache Spark. Jason is also a speaker and a consultant for designing large-scale architectures, implementing best security practices on the cloud, creating near real-time image detection analytics with deep learning, and developing serverless architectures to aid in ETL. His most recent roles include solution architect, big data engineer, big data specialist, and instructor at Amazon Web Services. He is currently the Chief Technology Officer of Next Rev Technologies and his favorite command line program is netcat

    Browse publications by this author
  • Raúl Estrada

    Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.

    Browse publications by this author
  • Phil Schwab

    Phil Schwab has been working in big data domain for around eight years. His first two jobs were at eBay and Apple in Silicon Valley. He recently joined Unravel Data, a startup headquartered in Silicon Valley that develops a leading performance management platform for big data applications and Spark.

    Browse publications by this author
  • Arush Kharbanda

    Arush has worked with Apache Spark for over 6 years. In 2019, Arush Co-Founded Ignosi Technologies with the mission to make AI accessible. At Ignosi he leads efforts to build Distributed Deep learning Solutions. Arush has worked for various MNCs including Sigmoid Analytics, Philips and Innovaccer.During his tenure at these companies Arush has developed large scale data processing pipelines at Petabyte Scale to process batches and streams of data using Spark. Arush has worked with various organizations to make their data processing Pipelines performant and stable. Arush has also developed various Machine Learning and Deep Learning Models, with and without Apache Spark. After that he has been working as an Consultant, helping organizations looking to implement Machine Learning and Deep Learning Solutions at Scale. Arush also loves to write about technology and his blogs have been published by various technology sites and often uses quora to share his knowledge.

    Browse publications by this author