Step-By-Step understanding of Protocol Buffers

We will learn about why organisations use Protocol Buffers, what are the advantages of using it, and will do simple implementation of it.

30 November, 2022

Contributors

Arnab Sen

@arnabsen1729

When we have to pass data from one server to another, or one process to another we usually need some way to encode that data. This encoding is called serialization. Let's say you have to send your name to someone you can do that by one of the following methods:

Arnab<space>Sen
{"fname":"Arnab", "lname":"Sen"}
fname=Arnab;lname=Sen;

Method 1 is pretty standard, we do that almost every day. We simply separate the entities with space. We can be more specific like Method 2, where we explicitly say that "Arnab" is "fname" and "Sen" is "lname". You might be familiar with this format, it is called JSON. Similarly, you can come up with your own methods as we have with Method 3.

We took a piece of information i.e a name and then serialized it in a manner so that when we send the serialized data to the other person they can deserialize it and get the required information.

Protocol buffer is one such method to serialize structured data. It is built by Google and is completely open-source.

Protocol Buffers GitHub Repo

https://github.com/protocolbuffers/protobuf

Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.

While Interning at Google, I had to work with Protocol Buffers (or protobuf in short) a lot, and we will discuss in this article why Google and many other organizations prefer protobufs.

My Desktop Setup

One interesting thing to note about Protobuf is that it is language neutral, so no matter which language you are using you can use protobufs with no headaches.

NOTE: There is a difference between language-neutral and language-agnostic. In this case protocol buffers are language neutral because it supports different languages but we have to do some configuration for them. Whereas, language-agnostic means it is totally independent of the programming language.

💭 Understanding the Schema

In Protocol Buffers, we specify a schema that is required for serialization. A schema is like a blueprint for the objects we want to create. All the schemas are usually stored in a *.proto file.

At a quick glance we can see that in a schema we specify the type, and then the attribute name which is then followed by an integer.

✨ Advantages of Protocol Buffers

Let's quickly go through the basic features/advantages that protocol buffers provide us:

The schema defines the type for each field. Now, this might seem very trivial at first because even in JSON we also have types (like boolean, string, int, etc). But the difference is that protocol buffers are strongly typed i.e. we cannot put a string in a uint32 field. On the contrary, in JSON we can do that.
All the logic (code) for serialization and deserialization will be generated by the protocol buffer compiler, we don't have to write them. These codes can be generated in C++, Python, JS, Golang, Dart, and many more languages. So this way we can put our focus on defining the schema and not think about how to serialize the data.
The protocol buffers are designed in a way that they can support evolution. In other words, if we change the schema, it can still be backward compatible and we won't be breaking the data serialization and deserialization process.
Another small advantage is the ability to add comments. This might seem very trivial but it's very useful in big codebases and large organizations to document the message and the fields.
The biggest advantage that protocol buffers provide is that the data is serialized as binary and not as string. This makes a huge difference in terms of the size of the data serialized which is then written onto the disk or shared over the network. We will see test this out later on in this blog.

At the same time, the fact that the serialized data is binary makes debugging a little difficult.

In this article, we will install the protocol buffer compiler and then see some implementation. We will move deep into protocol buffers in future blogs of this series.

🔧 Installing the Protocol Buffer Compiler

The protocol buffer compiler also known as protoc can be installed by the popular package managers:

Linux

MacOs (using Homebrew)

Others

Visit the release page here and download the zip file for your operating system and computer architecture (protoc-<version>-<os><arch>.zip).

GitHub release page of protoc binary

Unzip the folder and you should see a binary file protoc.

Once the installation is complete run the following command to verify:

🛠️ Setting up the project

First, create a directory, and inside that create a proto file. Let's try to create a schema for a list of Users. So let's call the proto file users.proto.

NOTE: If you are using VSCode do install this extension for better protocol buffer support.

vscode-proto3 extension of VSCode

✍️ Writing the Schema

Inside the users.proto file

In the first line, we specify the proto version that we are using. And then we specify the schema for individual User and then for a list of Users called Users. We will dive deep into the various keywords later, but first, let's try to get an overview of protocol buffers.

⚙️ Compiling the proto message

Now, that we have the schema ready, we will use our protoc compiler to generate the code for serialization and deserialisation.

So to compile the users.proto run the following command in your terminal:

After running this you will see that a new file is created called users_pb.js and it has all the code necessary for serializing and deserializing the Users.

Generated users_pb.js file

🪛 Using protobuf structures

Now to use this generated users_pb.js file we have to install a package called google-protobuf. So to do that, let's initialize our directory as an npm project.

And then run the command:

Let's create our own JS file index.js.

First, we will import the generated JS file with

Now, Schema has a class called User. So we can create an instance of User in this manner:

Here comes the best part, the protoc compiler has already generated some methods for us to set the various params of this object like id, fname, lname.

So, we can set the values of this user1 by simply invoking those methods.

So, the entire JS files now look like:

If we run this we get:

Similarly, we can create a bunch of users and then write them to disk and see how much space they consume:

So, for that we have to tweak the code a little bit:

In the above code, we are first creating a JSON file and then creating a protocol buffer serialized binary file.

Now, run the command:

This should create two files users.bin and users.json. Compare the sizes of the two:

Comparison of sizes of Protobuf Serialised data and JSON data

The Protocol buffer generates a file of 34Bytes and that of JSON is 81Bytes. That's like 42% saving of the size.

📝 Conclusion

So, in this article we learned about:

•

why protocol buffers are used

•

how to create a simple schema

•

compile a proto and use the schema to serialize data

tutorial

developers

handsontutorial

develevate

hotintech