Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
In-Memory Analytics with Apache Arrow
In-Memory Analytics with Apache Arrow

In-Memory Analytics with Apache Arrow: Perform fast and efficient data analytics on both flat and hierarchical structured data

By Matthew Topol
Mex$771.99 Mex$539.99
Book Jun 2022 392 pages 1st Edition
eBook
Mex$771.99 Mex$539.99
Print
Mex$963.99
Subscription
Free Trial
eBook
Mex$771.99 Mex$539.99
Print
Mex$963.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 24, 2022
Length 392 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781801071031
Vendor :
Apache
Category :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

In-Memory Analytics with Apache Arrow

Chapter 1: Getting Started with Apache Arrow

Regardless of whether you are a data scientist/engineer, a machine learning (ML) specialist, or a software engineer trying to build something to perform data analytics, you've probably heard or read about something called Apache Arrow and either looked for more information or wondered what it was. Hopefully, this book can serve as a springboard both in understanding what Apache Arrow is and isn't, and also as a reference book to be continuously utilized in order to supercharge your analytical capabilities.

For now, let's just start off by explaining what Apache Arrow is and what you will use it for. Following that, we will walk through the Arrow specifications, set up a development environment where you can play around with the Apache Arrow libraries, and walk through a few simple exercises to get a feel for how to use them.

In this chapter, we're going to cover the following topics:

  • Understanding the Arrow format and specifications
  • Why does Arrow use a columnar in-memory format?
  • Learning the terminology and the physical memory layout
  • Arrow format versioning and stability
  • Setting up your shooting range

Technical requirements

For the portion of the chapter describing how to set up a development environment for working with the Arrow libraries, you'll need the following:

  • Your preferred Integrated Development Environment (IDE): For example, VSCode, Sublime, Emacs, and Vim
  • Plugins for your desired language (optional but highly recommended)
  • Interpreter or toolchain for your desired language(s):
    • Python 3+: pip and venv and/or pipenv
    • Go 1.16+
    • C++ Compiler (capable of compiling C++11 or newer)
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn about Apache Arrow's data types and interoperability with pandas and Parquet
  • Work with Apache Arrow Flight RPC, Compute, and Dataset APIs to produce and consume tabular data
  • Reviewed, contributed, and supported by Dremio, the co-creator of Apache Arrow

Description

Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily. In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow’s versatility and benefits as you walk through a variety of real-world use cases. You'll cover key tasks such as enhancing data science workflows with Arrow, using Arrow and Apache Parquet with Apache Spark and Jupyter for better performance and hassle-free data translation, as well as working with Perspective, an open source interactive graphical and tabular analysis tool for browsers. As you advance, you'll explore the different data interchange and storage formats and become well-versed with the relationships between Arrow, Parquet, Feather, Protobuf, Flatbuffers, JSON, and CSV. In addition to understanding the basic structure of the Arrow Flight and Flight SQL protocols, you'll learn about Dremio’s usage of Apache Arrow to enhance SQL analytics and discover how Arrow can be used in web-based browser apps. Finally, you'll get to grips with the upcoming features of Arrow to help you stay ahead of the curve. By the end of this book, you will have all the building blocks to create useful, efficient, and powerful analytical services and utilities with Apache Arrow.

What you will learn

Use Apache Arrow libraries to access data files both locally and in the cloud Understand the zero-copy elements of the Apache Arrow format Improve read performance by memory-mapping files with Apache Arrow Produce or consume Apache Arrow data efficiently using a C API Use the Apache Arrow Compute APIs to perform complex operations Create Arrow Flight servers and clients for transferring data quickly Build the Arrow libraries locally and contribute back to the community

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 24, 2022
Length 392 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781801071031
Vendor :
Apache
Category :
Concepts :

Table of Contents

16 Chapters
Preface Chevron down icon Chevron up icon
Section 1: Overview of What Arrow Is, its Capabilities, Benefits, and Goals Chevron down icon Chevron up icon
Chapter 1: Getting Started with Apache Arrow Chevron down icon Chevron up icon
Chapter 2: Working with Key Arrow Specifications Chevron down icon Chevron up icon
Chapter 3: Data Science with Apache Arrow Chevron down icon Chevron up icon
Section 2: Interoperability with Arrow: pandas, Parquet, Flight, and Datasets Chevron down icon Chevron up icon
Chapter 4: Format and Memory Handling Chevron down icon Chevron up icon
Chapter 5: Crossing the Language Barrier with the Arrow C Data API Chevron down icon Chevron up icon
Chapter 6: Leveraging the Arrow Compute APIs Chevron down icon Chevron up icon
Chapter 7: Using the Arrow Datasets API Chevron down icon Chevron up icon
Chapter 8: Exploring Apache Arrow Flight RPC Chevron down icon Chevron up icon
Section 3: Real-World Examples, Use Cases, and Future Development Chevron down icon Chevron up icon
Chapter 9: Powered by Apache Arrow Chevron down icon Chevron up icon
Chapter 10: How to Leave Your Mark on Arrow Chevron down icon Chevron up icon
Chapter 11: Future Development and Plans Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.