Home

Programming

Protocol Buffers Handbook

By Clément Jean

Book

eBook $31.99 $21.99

Print $39.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $31.99 $21.99

Print $39.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Explore how Protocol Buffers (Protobuf) serialize structured data and provides a language-neutral, platform-neutral, and extensible solution. With this guide to mastering Protobuf, you'll build your skills to effectively serialize, transmit, and manage data across diverse platforms and languages. This book will help you enter the world of Protocol Buffers by unraveling the intricate nuances of Protobuf syntax and showing you how to define complex data structures. As you progress, you’ll learn schema evolution, ensuring seamless compatibility as your projects evolve. The book also covers advanced topics such as custom options and plugins, allowing you to tailor validation processes to your specific requirements. You’ll understand how to automate project builds using cutting-edge tools such as Buf and Bazel, streamlining your development workflow. With hands-on projects in Go and Python programming, you’ll learn how to practically apply Protobuf concepts. Later chapters will show you how to integrate data interchange capabilities across different programming languages, enabling efficient collaboration and system interoperability. By the end of this book, you’ll have a solid understanding of Protobuf internals, enabling you to discern when and how to use and redefine your approach to data serialization.

Publication date:: April 2024
Publisher: Packt
Pages: 226
ISBN: 9781805124672
Download code from GitHub

Serialization Primer

Welcome to the first chapter of this book: Serialization Primer. Whether you are experienced in serialization and deserialization or new to the concept, we are here to learn what these concepts are and how they relate to Protocol Buffers (Protobuf). So, let’s get started!

At its heart, Protobuf is all about serialization and deserialization. So, before starting to write our own proto files, generate code, and use it in code, we will need to understand what Protobuf is used for and why it is used. In this primer, we will lay the foundations so that we can build up our knowledge of Protobuf on top of them.

In this chapter, we are going to cover the following topics:

Serialization’s goals
Why use Protobuf?

At the end of this chapter, you will be confident about your knowledge of serialization and deserialization. You will understand what Protobuf is doing and why it is important to have such a data format in our software engineer toolbox.

Technical requirements

All the code examples that you will see in this section can be found under the directory called chapter1 in the GitHub repository (https://github.com/PacktPublishing/Protocol-Buffers-Handbook).

Serialization’s goals

Serialization and its counterpart, deserialization, enable us to handle data seamlessly across time and machines. This is highly relevant to current workloads because we have more and more data that we process in batches and we have more and more distributed systems that need to send data on the wire. In this section, we will dive a little bit deeper into serialization to understand how it can achieve encoding of data for later use.

How does it all work?

In order to process data across time and machines, we need to take data that is in volatile memory (memory that needs constant power in order to retain data) and save it to non-volatile memory (memory that retains stored information even after power is removed) or send data that is in volatile memory over the wire. And because we need to be able to make sense of the data that was saved to a file or sent over the wire, this data needs to have a certain structure.

The structure of the data is called the data format. It is a way of encoding data that is understood by the deserialization process in order to recreate the original data. Protobuf is one such data format.

Once this data is encoded in a certain way, we can save it to a file, send it across the network, and so on. And when the time comes to read this data, deserialization kicks in. It will take the data encoded following the data format rules (serialized data) and turn it back into data in volatile memory. Figure 1.1 summarizes graphically the data lifecycle during serialization and deserialization.

Figure 1.1 – Data lifecycle during serialization/deserialization

To summarize, in order to treat data across time and machines, we need to serialize it. This means that we need to take some sort of data and encode it with rules given by a specific data format. And after that, when we want to finally reuse the data, we deserialize the encoded data. This is the big picture of serialization and deserialization. Let us now see what the different data formats are and the metrics we can use in order to compare them.

The different data formats

As mentioned earlier in this chapter, a data format is a way of encoding data that can be interpreted by both the application engaged in serialization and the one involved in deserialization. There are multiple such formats, so it is impractical to list them all. However, we can distribute all of them into two categories:

Text
Binary

The most commonly used text data formats are JSON and XML. And for binary, there is Protobuf, but you might also know Apache Avro, Cap’n Proto, and others.

As with everything in software engineering, there are trade-offs for using one or the other. But before diving into these trade-offs, we need to understand the criteria on which we will base our comparison. Here are the criteria:

Serialized data size: This is the size in bytes of the data after serialization
Availability of data: This is the time it takes to deserialize data and use it in our code
Readability of data: As humans, is it possible to read the serialized data without too much effort?

On top of that, certain data formats, such as Protobuf, rely on data schema to define the structure of the data. This brings another set of criteria that can be used to determine which data format is the best for your use case:

Type safety: Can we make sure, at compile time, that the provided data is correct?
Readability of schema: As humans, is it possible to read the schema without too much effort?

Let us now dive deeper into each criterion so that, later on, we can evaluate how Protobuf is doing against the competition.

Serialized data size

As the goal of serializing data is to send it across the wire or save it in non-volatile storage, there is a direct benefit of having a smaller payload size: we save disk space and/or bandwidth. In other words, this means that with the same amount of storage or bandwidth, the smaller payloads allow us to save/send more data.

Now, it is important to mention that, depending on the context, it is not always a good thing to have the smallest serialized data size. We are going to talk about this in the next subsection, but serializing to a smaller number of bytes means going away from the domain of human readability and going more toward the domain of machine readability.

An example of this can be illustrated if we take the following JavaScript Object Notation (JSON):

{
  "person": {
    "name": "Clement"
  }
}

Let’s compare it with the hexadecimal representation of Protobuf serialized data, which is the same as the previous data:

0a 07 43 6c 65 6d 65 6e 74

In the former code block, we can see the structure of the object clearly, but the latter is serialized as a smaller number of bytes (9 instead of 39 bytes). There is a trade-off here and while our example is not showing real-life data, we can get the idea of size saving bandwidth/storage while at the same time affecting readability.

Finally, serializing to a smaller number of bytes has another benefit which is that, since there are fewer bytes, we can go over all the data in less time and thus deserialize it faster. This is linked to the concept of availability of data, and this is what we are going to see now.

Availability of data

It is not a surprise that the bigger the number of bytes we deal with, the more processing power it will take. It is important to understand that deserialization is not free and that the more human-readable data is, the more data massaging will be required and thus more processing power will be used.

The availability of data is the time it takes between deserializing data and being able to act on it in code. If we were to write pseudo code for calculating the availability of data, we would have something like the following code block:

start = timeNowMs()
deserialize(data)
end = timeNowMs()
availability = end - start

It is as simple as this. Now, I believe we can agree that there is not much of a trade-off here. As impatient and/or performance-driven programmers, we prefer to lower the time it takes to deserialize data. The faster we get our data, the less it costs in bandwidth, developer time, and so on.

Readability of data

We mentioned the readability of data when we were talking about data serialization size. Let us now dive deeper into what it means. The readability of data refers to how easy it is to understand the structure of the data after serialization. While readability and understandability are subjective, there are formats that are inherently harder than others.

Text formats are generally easier to understand because they are designed for humans. In these formats, we want to be able to quickly edit values with a simple text editor. This is the case for XML, JSON, YAML, and others. Furthermore, the ownership of objects is clearly defined by nesting objects into other ones. All of this is very intuitive, and it makes the data human readable.

On the other hand, binary formats are designed to be read by computers. While it is feasible to read them with knowledge of the format and with some extra effort, the overall goal is to make the computer serialize/deserialize the data faster. Open such serialized data with a text editor and you will see, sometimes, it is not even possible to distinguish bytes. Instead, you would have to use tools such as hexdump to see the hexadecimal representation of the binary.

To summarize, the readability of data is a trade-off that mainly focuses on who the reader of the data is. If it is humans, go with text formats, and if it is machines, choose binary since they will be able to understand the data faster.

Type safety

As mentioned, some data formats rely on us to describe the data through schemas. If you have worked with JSON Schema or any other kind of schema, you know that we can explicitly set types to fields in order to limit the possible value. For example, say we have the following Protobuf schema (note that this is not correct because it is simplified):

message Person {
  string name;
}

With this Protobuf schema, we would be only able to set strings to the name field, not integers, floats, or others. On the other hand, say we add the following JSON (without JSON Schema):

{
  "person": {
    "name": "Clément"
  }
}

Nothing could stop us from setting names to a number, an array, or anything else.

This is the idea of type safety. Generally, we are trying to make the feedback loop for developers shorter. If we are setting the wrong type of data to a field, the earlier we know about it, the better. We should not be surprised by an exception when the code is in production.

However, it is important to note that there are also benefits to having dynamic typing. The main one is considered to be productivity. It is way faster to hack and iterate with JSON than to first define the schema, generate code, and import the generated code into the user code.

Once again, this is a trade-off. But this time, this is more about the maintainability and scale of your project. If you have a small project, why incur the overhead of having the extra complexity? But if you have a medium/large size project, you want to have a more rigorous process that will shorten the feedback loop for your developers.

Readability of schema

Finally, we can talk about the readability of the schema. Once again this is a subjective matter but there are points that are worth mentioning. The readability of the schema can be described as how easy it is to understand the relationship between objects and what the data represents.

Although it may appear that this applies solely to formats reliant on schemas, that’s not the case. In reality, certain data formats are both the schema and the data they represent; JSON is an example of that. With nesting and typing of data, JSON can represent objects and their relationships, as well as what the internal data looks like. An example of that is the following JSON:

{
  "person": {
    "name": "Clément",
    "age": 100,
    "friends": [
      { "name": "Mark" },
      { "name": "John" }
    ]
  }
}

It describes a person whose name (a string) is Clement, whose age (a number) is 100, and who has friends (an array) called Mark and John. As you can see, this is very intuitive. However, such a schema is not type-safe.

If we care about type safety, then we cannot choose to use straight JSON. We could set the wrong kind of data to the name, age, and friends fields. Furthermore, nothing indicates whether the objects in the friends fields are persons. This is mostly because, in JSON, we cannot create definitions of objects and use the types. All types are implicit.

Now, consider the following Protobuf schema (note that this is simplified):

message Person {
  string name;
  uint32 age;
  repeated Person friends; // equivalent of a list
}

Here we only have the schema, no data. However, we can quickly see that we have the explicit string, uint32, and Person types. This will prevent a lot of bugs by catching them at compile time. We simply cannot set a string to age, a number to name, and so on.

There are a lot more criteria for the readability of schema (boilerplate, language concepts, etc.) but we can probably agree on the fact that, for onboarding developers and maintaining the project, the type explicitness is important and will catch bugs early in the development process.

To summarize, we got ourselves another trade-off. The added complexity of an external schema might be too much for small projects. However, for medium and large projects the benefits that come with type-safe schema are worth the trouble of spending a little bit more time on schema definition.

You probably noticed, but there are a lot of trade-offs. All of these are fueled by business requirements but also by subjectivity. In this section, we saw the five important trade-offs that you need to consider before choosing a data format for your project. We saw that the size of serialized data is important to consider in order to save resources (e.g. storage and bandwidth). Then, we saw what is the availability of data in the context of deserialization and how the size of the serialized data will impact it. After that, we talked about the readability of serialized data. We said that whether the size of serialized data matters depends on who is reading the data. And finally, we talked about the readability and type safety of the schema. By having explicit types, we can make sure that only the right data gets serialized and deserialized, but it also makes reading the schema itself more approachable for new developers.

What about Protobuf?

So far, we have talked only a little bit about Protobuf. In this section, we are going to start explaining the advantages and disadvantages of Protobuf, and we are going to base the analysis on the criteria we saw in the previous section.

Serialized data size

Due to a lot of optimization and its binary format, Protobuf is one of the best formats in the area. This is especially true for serializing numbers. Let us see why this is the case.

The main reason why Protobuf is good at serializing data in a small number of bytes is that it is a binary format. In Protobuf, additional structural elements such as curly braces, colons, and brackets, typically found in formats such as JSON, XML, and YAML, are absent. Instead, Protobuf data is represented simply as raw bytes.

On top of that, we have optimization such as bitpacking and the use of varints, which help a lot. It makes serializing the same data to a smaller number of bytes easier than it would be if we naively serialized the binary representation of the data as it is in memory.

Bitpacking is a method that compresses data into as few bits as possible. Notice that we are talking about bits and not bytes here. You can think of bitpacking as trying to put multiple pieces of data in the same number of bytes. Here is an example of bitpacking in Protobuf:

0a  -> 0101 00000
    -> 010 is 2 (length-delimited type like string)
    -> 1 is the field tag

Do not worry too much about what this data is representing just yet. We are going to see that in Chapter 5, Serialization, when we talk about the internals of serialization. The most important thing to understand is that we have one byte containing two pieces of metadata. At scale, this reduces the payload size quite a lot.

Most of the integers in Protobuf are varints. Varints is short for variable-size integers and they map integers to different numbers of bytes. How this works is that the smaller values will get mapped to a smaller number of bytes and the bigger ones will get translated to a larger number of bytes. Let us see an example (decimal -> byte(s) in hexadecimal):

1 -> 01
128 -> 80 01
16,384 -> 80 80 01
...

As you can see, compared to fixed-size integers (four or eight bytes), it saves space in a lot of cases. However, this also can result in non-efficient compression of the data. We can also store a 32-bit number (normally 4 bytes) in 5 bytes. Here is an example of such a case:

2,147,483,647 (max 32-bit number value) -> ff ff ff ff 07

We have five bytes instead of four if we serialize with a fixed-size integer.

While we did not get into too much detail about the internals of serialization that make the serialized data size smaller, we learned about the two main optimizations that are used in Protobuf. We saw that it uses bitpacking to compress data into smaller amounts of bits. This is what Protobuf does to limit the impact of metadata on serialized data size. Then, we saw that Protobuf also uses varints. They map smaller integers to smaller numbers of bytes. This is how most integers are serialized in Protobuf.

Availability of data

Deserializing data in Protobuf is fast because we are parsing binary instead of text. Now, this is hard to demonstrate because it is highly dependent on the programming language you are using and the data you have. However, a study conducted by Auth0 (https://auth0.com/blog/beating-json-performance-with-protobuf/) showed that Protobuf, in their use case, has better availability than JSON.

What is interesting in this study is that they found that Protobuf messages were available in their JavaScript code in 4% less time than JSON. Now, this might seem like a marginal improvement, but you have to consider the fact that JSON is JavaScript object literal format. This means that by deserializing Protobuf in JavaScript, we have to convert binary to JSON in order to access the data in our code. And even with that conversion overhead, Protobuf manages to be faster than parsing JSON directly.

This shows that, in JavaScript (JS), even with the overhead of having to transform binary to JSON, Protobuf performs better than JSON itself.

Be aware that, for every implementation, you will have different numbers. Some are more optimized than others. However, a lot of implementations are just wrappers around the C implementation, which means you will get quasi-consistent deserialization.

Finally, it is important to mention that serialization and deserialization speeds depend a lot on the underlying data. Some data formats perform better on certain kinds of data. So, if you are considering using any data schema, I urge you to do some benchmarking first. It will prevent you from being surprised.

Readability of data

As you might expect, readability of data is not where Protobuf shines. As the serialized data is in binary, it requires more effort or tools for humans to be able to understand it. Let’s take a look at the following hexadecimal:

0a 07 43 6c 65 6d 65 6e 74 10 64 1a 06 0a 04 4d 61 72 6b 1a 06 0a 04 4a 6f 68 6e

We have no way to guess that the preceding hexadecimal represents a Person object.

Now, while the default way of serializing data is not human-readable, Protobuf provides libraries to serialize data into JSON or the Protobuf text format. We can take the previous data that we had in hexadecimal and get text output similar to that in the following (Protobuf text format) code block:

name: "Clément"
age: 100
friends: {
  name: "Mark"
}
friends: {
  name: "John"
}

Furthermore, this text format can also be used to create binary. This means that we could have a configuration file written in text and still use the data with Protobuf.

So, as we saw, Protobuf serialized data is by default not human-readable. It is binary and it would take some extra effort or more tools to read. However, since Protobuf provides a way to serialize data to its own text format or to JSON, we can still have human-readable serialized data. This is nice for configuration files, test data, and so on.

Type safety

An advantage that schema-backed data formats have is that the schemas are generally defined with types in mind. Protobuf can define objects called messages that contain fields that are themselves typed. We saw a few examples previously in this chapter. This means that we will be able to check at compile time that the values provided for certain fields are correct.

On top of that, since Protobuf has an extensive set of types, there is the possibility to optimize for specific use cases. An example of this is the type for the age field that we saw in Person. In JSON Schema, for example, we would have a type called integer; however, this does not ensure that we do not provide negative numbers. In Protobuf, we could use uint32 (unsigned 32-bit integer) or a uint64 (usigned 64-bit integer) and thus we would, at compile time, ensure that we do not get negative numbers.

Finally, Protobuf also provides the possibility to nest types and reference these types by their fully qualified name. This basically means that the types have scopes. Let us take an example to make things clearer. Let us say that we have the following definitions (note that this is simplified):

message PhoneNumber {
  enum Type {
    Mobile,
    Work,
    Fax
  }
  Type type;
  string number;
}
message Email {
  enum Type {
    Personal,
    Work
  }
  Type type;
  string email;
}

As we can see, we defined two enums called Type. It is necessary because PhoneNumber supports Fax and Mobile but Email does not. Now, here when we are dealing with the Type enum in PhoneNumber, we are going to refer to it as PhoneNumber.Type, and when we deal with the one in Email, we will refer to it as Email.Type. These names are the fully qualified names of the types and they differentiate the two types, which have the same name, by providing a scope.

Now, let us think about what would happen if we had the following definitions:

enum Type {
  Mobile,
  Work,
  Fax,
  Personal
}
message PhoneNumber {
  Type type;
  string number;
}
message Email {
  Type type;
  string email;
}

We could still create valid Email instances and PhoneNumber instances because we have all the types in the Type enum. However, in this case, we could also create invalid Emails and PhoneNumbers. For example, it is possible to create an Email type with the Mobile type. This means that by providing scoping for types, we can have type safety without having to create two enums:

enum PhoneType { /*...*/ }
enum EmailType { /*...*/ }

This is verbose and unnecessary since Protobuf lets us create types that will be called Phone.Type and Email.Type.

To summarize, Protobuf lets us use types in a very explicit and safe way. We have an extensive set of types that we can use and that will let us ensure that our data is correct at compile time. Finally, we saw that by providing nested types referenced by their fully qualified names, we can differentiate types with the same names and ensure that only certain correct values get set to fields.

Readability of schema

Finally, Protobuf has readable schemas because it was designed as self-documenting and as a language. We can find a lot of concepts that we are already familiar with such as type, comments, and imports. All of this makes onboarding new developers and using Protobuf-backed APIs easy.

The first thing worth mentioning is that we can have comments in our schemas to describe what the fields and types are doing. One of the best examples of that is in the proto files provided in Protobuf itself. Let’s look at duration.proto (simplified for brevity) in the following code block:

// A Duration represents a signed, fixed-length span of time represented
// as a count of seconds and fractions of seconds at nanosecond
// resolution. It is independent of any calendar and concepts like "day" or "month".
message Duration {
  // Signed seconds of the span of time. Must be from -315,576,000,000
  // to +315,576,000,000 inclusive. Note: these bounds are computed from:
  // 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 
  years
  int64 seconds;
  // Signed fractions of a second at nanosecond resolution of the span
  // of time. Durations less than one second are represented with a 0
  // `seconds` field and a positive or negative `nanos` field. For 
  durations
  // of one second or more, a non-zero value for the `nanos` field 
  must be
  // of the same sign as the `seconds` field. Must be from 
  -999,999,999
  // to +999,999,999 inclusive.
  int32 nanos;
}

We can see that we have a comment explaining the purpose of Duration and two others explaining what the fields represent, what their possible values are, and so on. This is important because it helps users use the types properly and it can let us make assumptions.

Next, we can import some other schema files to reuse types. This helps a lot, especially when the project becomes bigger. Instead of having to duplicate code or write everything in one file, we can separate by concern and reuse definitions in multiple places. Furthermore, since imports use the path of the schema file to access the definition in it, we can look at the definition. Now, this requires knowing a little bit more about the compiler and its options, but other than that, we can access the code and start poking around.

Finally, as we mentioned earlier, Protobuf has explicit typing. By looking at a field, we know exactly what kind of value we need to set and what the boundaries are (if any). Furthermore, after learning a little bit about the internals of serialization in Protobuf, you will be able to optimize the types that you use in a granular way.

Summary

In this chapter, we defined what serialization and deserialization are, we saw the goals we try to achieve with them, and we talked about where Protobuf stands. We saw that there are a lot of trade-offs when choosing a data format that fulfills all the requirements. We also saw that, in serialization, we do not only care about one criterion; rather, there are many criteria that we need to look at. And finally, we saw the advantages and disadvantages of Protobuf as a data format.

In the next chapter, we will talk about the Protobuf language and all its concepts. We will see how to define types, import other schemas, and other language concepts.

Quiz

Serialization to file is the action of taking data from ___ memory and putting it in ___ memory.
1. Non-volatile, volatile
2. Volatile, non-volatile
Deserialization from file is the action of taking data from ___ memory and putting it in ___ memory.
1. Non-volatile, volatile
2. Volatile, non-volatile
What is availability in the context of serialization and deserialization?
1. The time it takes for deserialization to make the data available in your code
2. The time it takes for serialization to make the data available in a file
3. None of the above
When is serialized data considered human-readable?
1. If we can, by any means, decode the data
2. If a machine can easily read it
3. If humans can easily understand the structure of the data

Answers

About the Author

Clément Jean

Clément Jean is the CTO of Education for Ethiopia, a start-up focusing on educating K-12 students in Ethiopia. On top of that, he is also an online instructor (on Udemy, Linux Foundation, and others) teaching people about diff erent kinds of technologies. In both his occupations, he deals with technologies such as Protobuf and gRPC and how to apply them to real-life use cases. His overall goal is to empower people through education and technology.
Browse publications by this author