Reader small image

You're reading from  Scalable Data Streaming with Amazon Kinesis

Product typeBook
Published inMar 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800565401
Edition1st Edition
Languages
Right arrow
Authors (4):
Tarik Makota
Tarik Makota
author image
Tarik Makota

Tarik Makota hails from a small town in Bosnia. He is a Principal Solutions Architect with Amazon WebServices, builder, writer, and the self-proclaimed best fly fisherman at AWS. Never a perfect student, he managed to earn a Master of Science in Software Development and Management from RIT. When he is not “doing the cloud” or writing, Tarik spends most of his time flying fishing to pursue slippery trout. He feeds his addiction by spending summers in Montana. Tarik lives in New Jersey with his family, Mersiha, Hana, and two exceptionally perfect dogs.
Read more about Tarik Makota

Brian Maguire
Brian Maguire
author image
Brian Maguire

Brian Maguire is a Solution Architect at Amazon Web Services, where he is focused on helping customers build solutions in the cloud. He is a technologist, writer, teacher, and student who loves learning. Brian lives in New Hope, Pennsylvania, with his family Lorna, Ciara, Chris, and several cats.
Read more about Brian Maguire

Danny Gagne
Danny Gagne
author image
Danny Gagne

Danny Gagne is a Solutions Architect at Amazon Web Services. He has extensive experience in the design and implementation of large-scale high-performance analysis systems. He lives in New York City.
Read more about Danny Gagne

Rajeev Chakrabarti
Rajeev Chakrabarti
author image
Rajeev Chakrabarti

Rajeev Chakrabarti is a Principal Developer Advocate with the Amazon Kinesis and the Amazon MSK team. He has worked for many years in the Big Data and Data Streaming space. Before joining the Amazon Kinesis team, he was a streaming Specialist Solution Architect helping customers build streaming pipelines. He lives in New Jersey with his family, Shaifalee and Anushka.
Read more about Rajeev Chakrabarti

View More author details
Right arrow

Understanding data format conversion in KDF

KDF allows the conversion of incoming data from JSON to either Apache Parquet (Parquet) or Apache ORC (ORC) format. Parquet and ORC are popular columnar formats as opposed to JSON or Comma Separated Values (CSV), which are row formats. Columnar formats provide several advantages for storage and faster querying compared to row formats, especially in big-data use cases. In row formats, data for all columns in a row is stored together, which means that when querying a subset of columns, the data for all columns needs to be read and the unneeded columns filtered out. In columnar formats, data is stored by columns. This provides the ability to only retrieve data for the columns specified. This results in less data scanned for returning query results, and more sequential reads, resulting in better performance. In addition, since data in a column tends to be similar, columnar formats allow for better compression as well. This results in space saving...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Scalable Data Streaming with Amazon Kinesis
Published in: Mar 2021Publisher: PacktISBN-13: 9781800565401

Authors (4)

author image
Tarik Makota

Tarik Makota hails from a small town in Bosnia. He is a Principal Solutions Architect with Amazon WebServices, builder, writer, and the self-proclaimed best fly fisherman at AWS. Never a perfect student, he managed to earn a Master of Science in Software Development and Management from RIT. When he is not “doing the cloud” or writing, Tarik spends most of his time flying fishing to pursue slippery trout. He feeds his addiction by spending summers in Montana. Tarik lives in New Jersey with his family, Mersiha, Hana, and two exceptionally perfect dogs.
Read more about Tarik Makota

author image
Brian Maguire

Brian Maguire is a Solution Architect at Amazon Web Services, where he is focused on helping customers build solutions in the cloud. He is a technologist, writer, teacher, and student who loves learning. Brian lives in New Hope, Pennsylvania, with his family Lorna, Ciara, Chris, and several cats.
Read more about Brian Maguire

author image
Danny Gagne

Danny Gagne is a Solutions Architect at Amazon Web Services. He has extensive experience in the design and implementation of large-scale high-performance analysis systems. He lives in New York City.
Read more about Danny Gagne

author image
Rajeev Chakrabarti

Rajeev Chakrabarti is a Principal Developer Advocate with the Amazon Kinesis and the Amazon MSK team. He has worked for many years in the Big Data and Data Streaming space. Before joining the Amazon Kinesis team, he was a streaming Specialist Solution Architect helping customers build streaming pipelines. He lives in New Jersey with his family, Shaifalee and Anushka.
Read more about Rajeev Chakrabarti