Reader small image

You're reading from  Protocol Buffers Handbook

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805124672
Edition1st Edition
Right arrow
Author (1)
Clément Jean
Clément Jean
author image
Clément Jean

Clément Jean is the CTO of Education for Ethiopia, a start-up focusing on educating K-12 students in Ethiopia. On top of that, he is also an online instructor (on Udemy, Linux Foundation, and others) teaching people about diff erent kinds of technologies. In both his occupations, he deals with technologies such as Protobuf and gRPC and how to apply them to real-life use cases. His overall goal is to empower people through education and technology.
Read more about Clément Jean

Right arrow

The Protobuf Compiler

Now that we know the Protobuf syntax and the text format, we can finally get our hands dirty and play with the Protobuf compiler. In this section, we are going to generate code from .proto files, get binary from Protobuf text format, and get text from Protobuf serialized data (binary files).

In this chapter, we’re going to cover the following main topics:

  • Downloading and installing protoc
  • Transpiling .proto files
  • Specifying import paths
  • Encoding data to type with --encode
  • Decoding data to type with --decode
  • Decoding data to type, without .proto files, with --decode_raw
  • What about the other flags?

By the end of this chapter, you will know how to use protoc’s main flags. You will know how to generate C++ (or any other supported language) code, how to tell protoc where to find the imported files in your .proto files, and how to get binary data out of data described with text format.

Technical requirements

All the code examples that you will see in this section can be found in the Chapter4 directory in the GitHub repository (https://github.com/PacktPublishing/Protocol-Buffers-Handbook).

Downloading and installing protoc

Important note

For Windows users, I highly recommend you install protoc by using a package manager such as Chocolatey (https://chocolatey.org/) or any other one you want. Installing protoc header files is tricky, but they are necessary for getting Well-Known Type definitions. For Chocolatey, you should be able to run the following command:

$ choco install protoc

Before even thinking about all the code generation and serialization, we need to install the compiler. Depending on your needs, there are multiple ways of doing this. I am going to show two. The first one is downloading protoc from the GitHub Releases page (https://github.com/protocolbuffers/protobuf/releases), and the second one is installing from a tool such as curl or wget.

GitHub Releases page

There, you will have a list of different precompiled binaries for different platforms (Linux, macOS) and for different architectures (arm, x86). For a given version, you will have a...

Transpiling .proto files

We are finally ready to generate some code from the .proto file. And even though we are going to use the compiler for other tasks, this is the main one you will use protoc for. In this section, we will generate code in C++ and Go. This is not a random choice. One is a directly supported language for protoc, and the other is supported by adding/downloading a protoc plugin. By seeing how to generate code for these two languages, you should be able to generate code for any other language.

Why code generation?

Before even generating code, we need to understand what the point of generating code from a .proto file is. As of now, we mostly talked about Protobuf as an abstract concept being able to serialize and deserialize data. But, in later chapters, we will start using Protobuf serialization and deserialization in code.

To do so, instead of adding a dependency and calling exposed functions, Protobuf relies on generated code to manage all calls to the lower...

Specifying import paths

We saw that we can import files in Protobuf, but up until now, we only saw the syntax. If you do not remember, this looks like the following:

import "proto/a.proto";

Now, because the string after the import keyword is mostly a path, we might find ourselves with protoc not being aware of where this file is. This might happen in the following situations:

  • We want to keep the import path “clean,” meaning that we want all files in the project to be imported from a certain folder. For example, the proto directory is commonly used, and we could have all .proto files under this folder.
  • If we want to build the .proto files in a directory that cannot directly access the .proto file from the current location; for example, if we wanted to have shared libraries for multiple projects.

If you used GCC or Clang in the C/C++ world, this will feel very familiar to you. If you did not, do not worry; this is as simple as it gets.

...

Encoding data to type with --encode

Now, we will start to see flags that are important for learning Protobuf and inspecting the serialized data. We will start with the flag called --encode.

As its name suggests, the --encode flag is used to encode data. It will take some data and turn it into binary (serialization). This is especially useful at this point in the book because we can inspect data without having to write code yet. We simply need protoc and the knowledge we have on how to write in Protobuf Text Format.

Our goal in this section is not so much understanding the binary produced. We care about generating it first. In the next chapter, we will talk about the binary format. So, let us just write a simple textpb file and encode it with the –-encode flag.

We will have the following textpb file (encode/user.txtpb):

id: 42
name: "Clément"

We will also have the following .proto file (encode/user.proto):

syntax = "proto3";
message User...

Decoding data to type with --decode

Similarly, we have the --decode flag, which takes a binary and returns the data into text format. Once again, here, this flag is mostly for debugging and, in our case, for learning.

Now, remember that we already did the opposite of decode. This means that we will be able to take the output of encode, redirect it to decode, and we should get our input back. This would look like the following:

input > encode > decode > input

So, let us start with the encode part. We are already familiar with it; we can just execute the following command:

$ cat user.txtpb | protoc --encode=User user.proto

Or, we could execute this command:

$ Get-Content user.txtpb | protoc --encode=User user.proto

We will redirect the standard output of these commands to a file for convenience. We can do this by redirecting to a file, like so:

$ … > user.bin

With that, we can now see how to use --decode. It is very similar to --encode. It takes...

Decoding data to type without .proto files, with --decode_raw

The final flag that I want to present here is --decode_raw. Now, before even getting to why we would want this flag, it is important to recognize the constraints of --encode and --decode. There are two of them.

The first one is that we need to know which type the data needs to be serialized into or was serialized into. In situations where you trying to reverse engineer a solution or where you do not have much documentation, it is effectively impossible to use these two flags.

An example might be useful. Let us say that you find a file called an_app.preferences_pb on your Android phone (by the way, this is a real thing; check https://developer.android.com/codelabs/android-proto-datastore). You are not the developer of “an_app” but you still want to inspect the file and make sure that it is not storing sensitive information in “plain text.” Now, you read this book, and you are thinking that...

What about the other flags?

Obviously, after taking a look at the output of protoc --help, you cannot help but wonder what all these other flags are doing. For the sake of brevity, I do not cover them all here, but I thought it would be nice to mention some other flags and let you play with them. Consider this as a mini-challenge.

The first one that I particularly like is --descriptor_set_out. Now, we did not talk about Descriptor types yet. We will see them in more detail later in the book when we will manipulate them, but for now, all you need to know is that they are messages that represent Protobuf schema constructs. What this means is that we can encode the schema itself into binary.

For this mini-challenge, you will need to write a .proto file and encode it to binary with --descriptor_set_out. Once this is done, you will need to use --decode to inspect the content. Note that you have access to the .proto file where FileDescriptorSet (the type it serializes to) is defined...

Summary

In this chapter, we learned how to use protoc to generate code and serialize/deserialize data. We saw that we can generate code for directly supported languages and ones that need plugins installed. We then saw how to encode and decode data when we have access to the .proto file and, therefore, the type definitions. And finally, we saw that even if we do not have the .proto file and type definitions, we can get a semi-readable text format output, to help us get a feel about what data is encoded.

In the next chapter, we will learn the serialization internals. We will dive deep into the binary and understand how each kind of data is serialized/deserialized. This will use the skills that we learned in this chapter to get that new knowledge.

Quiz

  1. What is the short version of the --proto_path flag?
    1. -I
    2. -p
    3. -1
  2. Which flag would generate Java code?
    1. --java-out
    2. --java-gen-out
    3. --java_out
  3. When would you use –-decode_raw instead of --decode?
    1. When I do not have access to the .proto file
    2. When I do not know into which type the data was serialized into
    3. Both of these

Answers

  1. A
  2. C
  3. C

Challenge solutions

Challenge 1 – Descriptors

In this challenge, we need to use the --descriptor_set_out flag to generate binary out of our schema. Let us define a simple schema (name.proto):

syntax = "proto3";
message Name {
  string name = 1;
}

To generate a FileDescriptorSet out of it, we need to run the following command:

$ protoc --descriptor_set_out=name.desc name.proto

This will create a name.desc file that we can then analyze.

Now, the second step is to use --decode to set the internals of the FileDescriptorSet. To do that, assuming that you have the descriptor.proto file in /usr/local/include/google/protobuf, you can run the following:

$ cat name.desc | protoc -I/usr/local/include/google/protobuf --decode=google.protobuf.FileDescriptorSet /usr/local/include/google/protobuf/descriptor.proto

This should output something like this:

file {
  name: "name.proto"
  message_type {
   &...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Protocol Buffers Handbook
Published in: Apr 2024Publisher: PacktISBN-13: 9781805124672
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Clément Jean

Clément Jean is the CTO of Education for Ethiopia, a start-up focusing on educating K-12 students in Ethiopia. On top of that, he is also an online instructor (on Udemy, Linux Foundation, and others) teaching people about diff erent kinds of technologies. In both his occupations, he deals with technologies such as Protobuf and gRPC and how to apply them to real-life use cases. His overall goal is to empower people through education and technology.
Read more about Clément Jean