Getting Started with Greenplum for Big Data Analytics

A hands-on guide on how to execute an analytics project from conceptualization to operationalization using Greenplum

Getting Started with Greenplum for Big Data Analytics

Starting
Sunila Gollapudi

A hands-on guide on how to execute an analytics project from conceptualization to operationalization using Greenplum
$23.99
$39.99
RRP $23.99
RRP $39.99
eBook
Print + eBook
$12.99 p/month

Want this title & more? Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.
Code Files
+ Collection
Free Sample

Book Details

ISBN 139781782177043
Paperback172 pages

About This Book

  • Explore the software components and appliance modules available in Greenplum
  • Learn core Big Data architecture concepts and master data loading and processing patterns
  • Understand Big Data problems and the data science lifecycle

Who This Book Is For

"Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It’s assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.

Table of Contents

Chapter 1: Big Data, Analytics, and Data Science Life Cycle
Enterprise data
Big Data
Data analytics
Data science
References/Further reading
Summary
Chapter 2: Greenplum Unified Analytics Platform (UAP)
Big Data analytics – platform requirements
Greenplum Unified Analytics Platform (UAP)
Greenplum UAP components
Greenplum Data Computing Appliance (DCA)
Greenplum Data Integration Accelerator (DIA)
References/Further reading
Summary
Chapter 3: Advanced Analytics – Paradigms, Tools, and Techniques
Analytic paradigms
Analytics classified
Modeling methods
R programming
Weka
In-database analytics using MADlib
References/Further reading
Summary
Chapter 4: Implementing Analytics with Greenplum UAP
Data loading for Greenplum Database and HD
Greenplum table distribution and partitioning
Data Computing Appliance (DCA)
Greenplum Database management
In-database analytics options (Greenplum-specific)
Using R with Greenplum
Using Weka with Greenplum
Using MADlib with Greenplum
Using Greenplum Chorus
Pivotal
References/Further reading
Summary

What You Will Learn

  • Load data from multiple data sources using the built-in ELT / ETL
  • Learn Parallel Processing / MPP / MapReduce techniques
  • Program with R and MADlib
  • Understand back-up and recovery implementation in Greenplum
  • Optimize data processing and querying using optimal distribution and partitioning strategies
  • Exchange data between the Greenplum Database and Hadoop
  • Handle high-availability requirements on Greenplum
  • Integrate ETL, reporting, and visualization tools

In Detail

Organizations are leveraging the use of data and analytics to gain a competitive advantage over their opposition. Therefore, organizations are quickly becoming more and more data driven. With the advent of Big Data, existing Data Warehousing and Business Intelligence solutions are becoming obsolete, and a requisite for new agile platforms consisting of all the aspects of Big Data has become inevitable. From loading/integrating data to presenting analytical visualizations and reports, the new Big Data platforms like Greenplum do it all. It is now the mindset of the user that requires a tuning to put the solutions to work.

"Getting Started with Greenplum for Big Data Analytics" is a practical, hands-on guide to learning and implementing Big Data Analytics using the Greenplum Integrated Analytics Platform. From processing structured and unstructured data to presenting the results/insights to key business stakeholders, this book explains it all.

"Getting Started with Greenplum for Big Data Analytics" discusses the key characteristics of Big Data and its impact on current Data Warehousing platforms. It will take you through the standard Data Science project lifecycle and will lay down the key requirements for an integrated analytics platform. It then explores the various software and appliance components of Greenplum and discusses the relevance of each component at every level in the Data Science lifecycle.

You will also learn Big Data architectural patterns and recap some key advanced analytics techniques in detail. The book will also take a look at programming with R and integration with Greenplum for implementing analytics. Additionally, you will explore MADlib and advanced SQL techniques in Greenplum for analytics. This book also elaborates on the physical architecture aspects of Greenplum with guidance on handling high-availability, back-up, and recovery.

Authors

Table of Contents

Chapter 1: Big Data, Analytics, and Data Science Life Cycle
Enterprise data
Big Data
Data analytics
Data science
References/Further reading
Summary
Chapter 2: Greenplum Unified Analytics Platform (UAP)
Big Data analytics – platform requirements
Greenplum Unified Analytics Platform (UAP)
Greenplum UAP components
Greenplum Data Computing Appliance (DCA)
Greenplum Data Integration Accelerator (DIA)
References/Further reading
Summary
Chapter 3: Advanced Analytics – Paradigms, Tools, and Techniques
Analytic paradigms
Analytics classified
Modeling methods
R programming
Weka
In-database analytics using MADlib
References/Further reading
Summary
Chapter 4: Implementing Analytics with Greenplum UAP
Data loading for Greenplum Database and HD
Greenplum table distribution and partitioning
Data Computing Appliance (DCA)
Greenplum Database management
In-database analytics options (Greenplum-specific)
Using R with Greenplum
Using Weka with Greenplum
Using MADlib with Greenplum
Using Greenplum Chorus
Pivotal
References/Further reading
Summary

Book Details

ISBN 139781782177043
Paperback172 pages
Read More

Recommended for You