Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Learning Spark SQL Architect streaming analytics and machine learning solutions

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781785888359

Length 452 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Streaming

Author (1):

Sarkar

View More author details

Table of Contents (13) Chapters

Preface

1. Getting Started with Spark SQL

2. Using Spark SQL for Processing Structured and Semistructured Data FREE CHAPTER

3. Using Spark SQL for Data Exploration

4. Using Spark SQL for Data Munging

5. Using Spark SQL in Streaming Applications

6. Using Spark SQL in Machine Learning Applications

7. Using Spark SQL in Graph Applications

8. Using Spark SQL with SparkR

9. Developing Applications with Spark SQL

10. Using Spark SQL in Deep Learning Applications

11. Tuning Spark SQL Components for Performance

12. Spark SQL in Large-Scale Application Architectures

Using Spark SQL for Data Munging

In this code-intensive chapter, we will present key data munging techniques used to transform raw data to a usable format for analysis. We start with some general data munging steps that are applicable in a wide variety of scenarios. Then, we shift our focus to specific types of data including time-series data, text, and data preprocessing steps for Spark MLlib-based machine learning pipelines. We will use several Datasets to illustrate these techniques.

In this chapter, we shall learn:

What is data munging?
Explore data munging techniques
Combine data using joins
Munging on textual data
Munging on time-series data
Dealing with variable length records
Data preparation for machine learning pipelines

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Sarkar

Aurobindo Sarkar leads a team of data scientists and engineers at Session AI, developing cloud-based ML models for in-session marketing in e-commerce and retail. As a former CTO at multiple SaaS startups, he has architected secure, scalable, and highly available AWS cloud applications. His research interests now focus on AWS-based large-scale transformer models for NLP and HFT models for the futures and options market. Aurobindo holds a bachelor's degree in engineering from IIT Delhi, a master's in management from the Indian Institute of Science Bangalore, and a master's in computer science from New York University.

See other products by Sarkar