Pig Design Patterns

Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig.

Pig Design Patterns

Pradeep Pasupuleti

Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$23.10
RRP $32.99
Save 29%
Print + eBook
$54.99
RRP $54.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$23.10
$54.99
$29.99p/m after trial
RRP $32.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783285556
Paperback310 pages

Book Description

Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of data management. This book focuses on using Pig in an enterprise context, bridging the gap between theoretical understanding and practical implementation. Each chapter contains a set of design patterns that pose and then solve technical challenges that are relevant to the enterprise use cases.

The book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, in the form of a report or a predictive model. By the end of the book, readers will appreciate Pig's real power in addressing each and every problem encountered when creating an analytics-based data product. Each design pattern comes with a suggested solution, analyzing the trade-offs of implementing the solution in a different way, explaining how the code works, and the results.

Table of Contents

Chapter 1: Setting the Context for Design Patterns in Pig
Understanding design patterns
The scope of design patterns in Pig
Hadoop demystified – a quick reckoner
Pig – a quick intro
Understanding Pig through the code
Summary
Chapter 2: Data Ingest and Egress Patterns
The context of data ingest and egress
Types of data in the enterprise
Ingest and egress patterns for multistructured data
The ingress and egress patterns for the NoSQL data
The ingress and egress patterns for structured data
The ingress and egress patterns for semi-structured data
JSON ingress and egress patterns
Summary
Chapter 3: Data Profiling Patterns
Data profiling for Big Data
Rationale for using Pig in data profiling
The data type inference pattern
The basic statistical profiling pattern
The pattern-matching pattern
The string profiling pattern
The unstructured text profiling pattern
Summary
Chapter 4: Data Validation and Cleansing Patterns
Data validation and cleansing for Big Data
Choosing Pig for validation and cleansing
The constraint validation and cleansing design pattern
The regex validation and cleansing design pattern
The corrupt data validation and cleansing design pattern
The unstructured text data validation and cleansing design pattern
Summary
Chapter 5: Data Transformation Patterns
Data transformation processes
The structured-to-hierarchical transformation pattern
The data normalization pattern
The data integration pattern
The aggregation pattern
The data generalization pattern
Summary
Chapter 6: Understanding Data Reduction Patterns
Data reduction – a quick introduction
Data reduction considerations for Big Data
Dimensionality reduction – the Principal Component Analysis design pattern
Numerosity reduction – the histogram design pattern
Numerosity reduction – sampling design pattern
Numerosity reduction – clustering design pattern
Summary
Chapter 7: Advanced Patterns and Future Work
The clustering pattern
The topic discovery pattern
The natural language processing pattern
The classification pattern
Future trends
Summary

What You Will Learn

  • Understand Pig's relevance in an enterprise context
  • Use Pig in design patterns that enable data movement across platforms during and after analytical processing
  • See how Pig can co-exist with other components of the Hadoop ecosystem to create Big Data solutions using design patterns
  • Simplify the process of creating complex data pipelines using transformations, aggregations, enrichment, cleansing, filtering, reformatting, lookups, and data type conversions
  • Apply knowledge of Pig in design patterns that deal with integration of Hadoop with other systems to enable multi-platform analytics
  • Comprehend design patterns and use Pig in cases related to complex analysis of pure structured data

Authors

Table of Contents

Chapter 1: Setting the Context for Design Patterns in Pig
Understanding design patterns
The scope of design patterns in Pig
Hadoop demystified – a quick reckoner
Pig – a quick intro
Understanding Pig through the code
Summary
Chapter 2: Data Ingest and Egress Patterns
The context of data ingest and egress
Types of data in the enterprise
Ingest and egress patterns for multistructured data
The ingress and egress patterns for the NoSQL data
The ingress and egress patterns for structured data
The ingress and egress patterns for semi-structured data
JSON ingress and egress patterns
Summary
Chapter 3: Data Profiling Patterns
Data profiling for Big Data
Rationale for using Pig in data profiling
The data type inference pattern
The basic statistical profiling pattern
The pattern-matching pattern
The string profiling pattern
The unstructured text profiling pattern
Summary
Chapter 4: Data Validation and Cleansing Patterns
Data validation and cleansing for Big Data
Choosing Pig for validation and cleansing
The constraint validation and cleansing design pattern
The regex validation and cleansing design pattern
The corrupt data validation and cleansing design pattern
The unstructured text data validation and cleansing design pattern
Summary
Chapter 5: Data Transformation Patterns
Data transformation processes
The structured-to-hierarchical transformation pattern
The data normalization pattern
The data integration pattern
The aggregation pattern
The data generalization pattern
Summary
Chapter 6: Understanding Data Reduction Patterns
Data reduction – a quick introduction
Data reduction considerations for Big Data
Dimensionality reduction – the Principal Component Analysis design pattern
Numerosity reduction – the histogram design pattern
Numerosity reduction – sampling design pattern
Numerosity reduction – clustering design pattern
Summary
Chapter 7: Advanced Patterns and Future Work
The clustering pattern
The topic discovery pattern
The natural language processing pattern
The classification pattern
Future trends
Summary

Book Details

ISBN 139781783285556
Paperback310 pages
Read More

Read More Reviews

Recommended for You

Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Hadoop Real-World Solutions Cookbook Book Cover
Hadoop Real-World Solutions Cookbook
$ 29.99
$ 21.00
Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00