Reader small image

You're reading from  In-Memory Analytics with Apache Arrow

Product typeBook
Published inJun 2022
PublisherPackt
ISBN-139781801071031
Edition1st Edition
Concepts
Right arrow
Author (1)
Matthew Topol
Matthew Topol
author image
Matthew Topol

Matthew Topol is an Apache Arrow contributor and a principal software architect at FactSet Research Systems, Inc. Since joining FactSet in 2009, Matt has worked in both infrastructure and application development, led development teams, and architected large-scale distributed systems for processing analytics on financial data. In his spare time, Matt likes to bash his head against a keyboard, develop and run delightfully demented games of fantasy for his victims—er—friends, and share his knowledge with anyone interested enough to listen.
Read more about Matthew Topol

Right arrow

Would you download a library? Of course!

As mentioned before, the Arrow project contains a variety of libraries for multiple programming languages. These official libraries enable anyone to work with Arrow data without having to implement the Arrow format themselves, regardless of the platform and programming language they are utilizing. There are two primary types of libraries that exist so far: ones that are distinct implementations of the Arrow specification, and ones that are built on other implementations. As of the time of writing this book, there are currently implementations for Arrow in C++ [3], C# [4], Go [5], Java [6], JavaScript [7], Julia [8], and Rust [9], which are all distinct implementations.

On top of those, there are libraries for C (Glib) [10], MATLAB [11], Python [12], R [13], and Ruby[14], which are all built on top of the C++ library, which happens to have the most active development. As you might expect, the various implementations all have different stages as far as what features and aspects of the specification are implemented, and the documentation helpfully provides an implementation matrix showing what features are implemented in which libraries. The implementation matrix [15] is then updated as these aspects of the specification and features are implemented in a given library.

With so many different implementations, you might be concerned about interoperability between them. As a result, the various library versions are integration tested via automated continuous integration (CI) jobs in order to ensure this interoperability among them. Depending on the language and development, these libraries are tested on a very large variety of platforms, including but not limited to the following:

  • x86/x86-64
  • arm64
  • s390x (IBM Mainframes)
  • macOS
  • Windows 32 and 64 bit
  • Debian/Ubuntu/Red Hat/CentOS

These libraries are deployed with their various respective package managing methods to attempt to make it as easy as possible to acquire and download the libraries. As a result, there's been significant adoption of Arrow, whether you're a data scientist using pandas, numpy, or Dask, or you're performing calculations and analytics using Apache Spark or AirFlow. And, if you're looking to get the libraries so you can try them out for yourself, the Apache Software Foundation hosts various ways to download and acquire the libraries.

Some of the channels where the libraries are made available are as follows:

When developing something that will utilize the Arrow libraries, keep the terms that were mentioned a few pages ago in mind, as most of the libraries utilize similar terminology and naming for describing their Application Programming Interfaces (APIs).

Previous PageNext Page
You have been reading a chapter from
In-Memory Analytics with Apache Arrow
Published in: Jun 2022Publisher: PacktISBN-13: 9781801071031
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Matthew Topol

Matthew Topol is an Apache Arrow contributor and a principal software architect at FactSet Research Systems, Inc. Since joining FactSet in 2009, Matt has worked in both infrastructure and application development, led development teams, and architected large-scale distributed systems for processing analytics on financial data. In his spare time, Matt likes to bash his head against a keyboard, develop and run delightfully demented games of fantasy for his victims—er—friends, and share his knowledge with anyone interested enough to listen.
Read more about Matthew Topol