You're reading from Protocol Buffers Handbook

Product typeBook

Published inApr 2024

PublisherPackt

ISBN-139781805124672

Edition1st Edition

Concepts

Application Development

Author (1)

Clément Jean

The Protobuf Compiler

Now that we know the Protobuf syntax and the text format, we can finally get our hands dirty and play with the Protobuf compiler. In this section, we are going to generate code from .proto files, get binary from Protobuf text format, and get text from Protobuf serialized data (binary files).

In this chapter, we’re going to cover the following main topics:

Downloading and installing protoc
Transpiling .proto files
Specifying import paths
Encoding data to type with --encode
Decoding data to type with --decode
Decoding data to type, without .proto files, with --decode_raw
What about the other flags?

By the end of this chapter, you will know how to use protoc’s main flags. You will know how to generate C++ (or any other supported language) code, how to tell protoc where to find the imported files in your .proto files, and how to get binary data out of data described with text format.

Technical requirements

All the code examples that you will see in this section can be found in the Chapter4 directory in the GitHub repository (https://github.com/PacktPublishing/Protocol-Buffers-Handbook).

Downloading and installing protoc

Important note

For Windows users, I highly recommend you install protoc by using a package manager such as Chocolatey (https://chocolatey.org/) or any other one you want. Installing protoc header files is tricky, but they are necessary for getting Well-Known Type definitions. For Chocolatey, you should be able to run the following command:

$ choco install protoc

Before even thinking about all the code generation and serialization, we need to install the compiler. Depending on your needs, there are multiple ways of doing this. I am going to show two. The first one is downloading protoc from the GitHub Releases page (https://github.com/protocolbuffers/protobuf/releases), and the second one is installing from a tool such as curl or wget.

GitHub Releases page

There, you will have a list of different precompiled binaries for different platforms (Linux, macOS) and for different architectures (arm, x86). For a given version, you will have a...

Transpiling .proto files

We are finally ready to generate some code from the .proto file. And even though we are going to use the compiler for other tasks, this is the main one you will use protoc for. In this section, we will generate code in C++ and Go. This is not a random choice. One is a directly supported language for protoc, and the other is supported by adding/downloading a protoc plugin. By seeing how to generate code for these two languages, you should be able to generate code for any other language.

Why code generation?

Before even generating code, we need to understand what the point of generating code from a .proto file is. As of now, we mostly talked about Protobuf as an abstract concept being able to serialize and deserialize data. But, in later chapters, we will start using Protobuf serialization and deserialization in code.

To do so, instead of adding a dependency and calling exposed functions, Protobuf relies on generated code to manage all calls to the lower...

Specifying import paths

We saw that we can import files in Protobuf, but up until now, we only saw the syntax. If you do not remember, this looks like the following:

import "proto/a.proto";

Now, because the string after the import keyword is mostly a path, we might find ourselves with protoc not being aware of where this file is. This might happen in the following situations:

We want to keep the import path “clean,” meaning that we want all files in the project to be imported from a certain folder. For example, the proto directory is commonly used, and we could have all .proto files under this folder.
If we want to build the .proto files in a directory that cannot directly access the .proto file from the current location; for example, if we wanted to have shared libraries for multiple projects.

If you used GCC or Clang in the C/C++ world, this will feel very familiar to you. If you did not, do not worry; this is as simple as it gets.

...

Encoding data to type with --encode

Now, we will start to see flags that are important for learning Protobuf and inspecting the serialized data. We will start with the flag called --encode.

As its name suggests, the --encode flag is used to encode data. It will take some data and turn it into binary (serialization). This is especially useful at this point in the book because we can inspect data without having to write code yet. We simply need protoc and the knowledge we have on how to write in Protobuf Text Format.

Our goal in this section is not so much understanding the binary produced. We care about generating it first. In the next chapter, we will talk about the binary format. So, let us just write a simple textpb file and encode it with the –-encode flag.

We will have the following textpb file (encode/user.txtpb):

id: 42
name: "Clément"

We will also have the following .proto file (encode/user.proto):

syntax = "proto3";
message User...

Decoding data to type with --decode

Similarly, we have the --decode flag, which takes a binary and returns the data into text format. Once again, here, this flag is mostly for debugging and, in our case, for learning.

Now, remember that we already did the opposite of decode. This means that we will be able to take the output of encode, redirect it to decode, and we should get our input back. This would look like the following:

input > encode > decode > input

So, let us start with the encode part. We are already familiar with it; we can just execute the following command:

$ cat user.txtpb | protoc --encode=User user.proto

Or, we could execute this command:

$ Get-Content user.txtpb | protoc --encode=User user.proto

We will redirect the standard output of these commands to a file for convenience. We can do this by redirecting to a file, like so:

$ … > user.bin

With that, we can now see how to use --decode. It is very similar to --encode. It takes...

Decoding data to type without .proto files, with --decode_raw

The final flag that I want to present here is --decode_raw. Now, before even getting to why we would want this flag, it is important to recognize the constraints of --encode and --decode. There are two of them.

The first one is that we need to know which type the data needs to be serialized into or was serialized into. In situations where you trying to reverse engineer a solution or where you do not have much documentation, it is effectively impossible to use these two flags.

An example might be useful. Let us say that you find a file called an_app.preferences_pb on your Android phone (by the way, this is a real thing; check https://developer.android.com/codelabs/android-proto-datastore). You are not the developer of “an_app” but you still want to inspect the file and make sure that it is not storing sensitive information in “plain text.” Now, you read this book, and you are thinking that...

What about the other flags?

Obviously, after taking a look at the output of protoc --help, you cannot help but wonder what all these other flags are doing. For the sake of brevity, I do not cover them all here, but I thought it would be nice to mention some other flags and let you play with them. Consider this as a mini-challenge.

The first one that I particularly like is --descriptor_set_out. Now, we did not talk about Descriptor types yet. We will see them in more detail later in the book when we will manipulate them, but for now, all you need to know is that they are messages that represent Protobuf schema constructs. What this means is that we can encode the schema itself into binary.

For this mini-challenge, you will need to write a .proto file and encode it to binary with --descriptor_set_out. Once this is done, you will need to use --decode to inspect the content. Note that you have access to the .proto file where FileDescriptorSet (the type it serializes to) is defined...

Summary

In this chapter, we learned how to use protoc to generate code and serialize/deserialize data. We saw that we can generate code for directly supported languages and ones that need plugins installed. We then saw how to encode and decode data when we have access to the .proto file and, therefore, the type definitions. And finally, we saw that even if we do not have the .proto file and type definitions, we can get a semi-readable text format output, to help us get a feel about what data is encoded.

In the next chapter, we will learn the serialization internals. We will dive deep into the binary and understand how each kind of data is serialized/deserialized. This will use the skills that we learned in this chapter to get that new knowledge.

Quiz

What is the short version of the --proto_path flag?
1. -I
2. -p
3. -1
Which flag would generate Java code?
1. --java-out
2. --java-gen-out
3. --java_out
When would you use –-decode_raw instead of --decode?
1. When I do not have access to the .proto file
2. When I do not know into which type the data was serialized into
3. Both of these

Answers

Challenge solutions

Challenge 1 – Descriptors

In this challenge, we need to use the --descriptor_set_out flag to generate binary out of our schema. Let us define a simple schema (name.proto):

syntax = "proto3";
message Name {
  string name = 1;
}

To generate a FileDescriptorSet out of it, we need to run the following command:

$ protoc --descriptor_set_out=name.desc name.proto

This will create a name.desc file that we can then analyze.

Now, the second step is to use --decode to set the internals of the FileDescriptorSet. To do that, assuming that you have the descriptor.proto file in /usr/local/include/google/protobuf, you can run the following:

$ cat name.desc | protoc -I/usr/local/include/google/protobuf --decode=google.protobuf.FileDescriptorSet /usr/local/include/google/protobuf/descriptor.proto

This should output something like this:

file {
  name: "name.proto"
  message_type {
   &...

The rest of the chapter is locked

You have been reading a chapter from

Protocol Buffers Handbook

Published in: Apr 2024Publisher: PacktISBN-13: 9781805124672

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Clément Jean

Clément Jean is the CTO of Education for Ethiopia, a start-up focusing on educating K-12 students in Ethiopia. On top of that, he is also an online instructor (on Udemy, Linux Foundation, and others) teaching people about diff erent kinds of technologies. In both his occupations, he deals with technologies such as Protobuf and gRPC and how to apply them to real-life use cases. His overall goal is to empower people through education and technology.
Read more about Clément Jean

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5