Reader small image

You're reading from  In-Memory Analytics with Apache Arrow

Product typeBook
Published inJun 2022
PublisherPackt
ISBN-139781801071031
Edition1st Edition
Concepts
Right arrow
Author (1)
Matthew Topol
Matthew Topol
author image
Matthew Topol

Matthew Topol is an Apache Arrow contributor and a principal software architect at FactSet Research Systems, Inc. Since joining FactSet in 2009, Matt has worked in both infrastructure and application development, led development teams, and architected large-scale distributed systems for processing analytics on financial data. In his spare time, Matt likes to bash his head against a keyboard, develop and run delightfully demented games of fantasy for his victims—er—friends, and share his knowledge with anyone interested enough to listen.
Read more about Matthew Topol

Right arrow

Storage versus runtime in-memory versus message-passing formats

When we're talking about formats for representing data, there are a few different, complementary, yet competing things we typically are trying to optimize. We can generally (over-) simplify this by talking about three main components, as follows:

  • Size—The final size of the data representation
  • Serialize/deserialize speed—The performance for converting data between the formats and something that can be used in-memory for computations
  • Ease of use—A catch-all category regarding readability, compatibility, features, and so on

How we choose to optimize between these components is usually going to be heavily dependent upon the use case for that format. When it comes to working with data, there are three high-level use case descriptions I tend to group most situations into: long-term storage, in-memory runtime processing, and message passing. Yes—these groupings are quite...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
In-Memory Analytics with Apache Arrow
Published in: Jun 2022Publisher: PacktISBN-13: 9781801071031

Author (1)

author image
Matthew Topol

Matthew Topol is an Apache Arrow contributor and a principal software architect at FactSet Research Systems, Inc. Since joining FactSet in 2009, Matt has worked in both infrastructure and application development, led development teams, and architected large-scale distributed systems for processing analytics on financial data. In his spare time, Matt likes to bash his head against a keyboard, develop and run delightfully demented games of fantasy for his victims—er—friends, and share his knowledge with anyone interested enough to listen.
Read more about Matthew Topol