Step-By-Step understanding of Protocol Buffers
30 November, 2022
13
13
2
Contributors
When we have to pass data from one server to another, or one process to another we usually need some way to encode that data. This encoding is called serialization. Let's say you have to send your name to someone you can do that by one of the following methods:
Arnab<space>Sen
{"fname":"Arnab", "lname":"Sen"}
fname=Arnab;lname=Sen;
Method 1 is pretty standard, we do that almost every day. We simply separate the entities with space. We can be more specific like Method 2, where we explicitly say that "Arnab" is "fname" and "Sen" is "lname". You might be familiar with this format, it is called JSON. Similarly, you can come up with your own methods as we have with Method 3.
We took a piece of information i.e a name and then serialized it in a manner so that when we send the serialized data to the other person they can deserialize it and get the required information.
Protocol buffer is one such method to serialize structured data. It is built by Google and is completely open-source.
Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
While Interning at Google, I had to work with Protocol Buffers (or protobuf in short) a lot, and we will discuss in this article why Google and many other organizations prefer protobufs.
My Desktop Setup
One interesting thing to note about Protobuf is that it is language neutral, so no matter which language you are using you can use protobufs with no headaches.
NOTE: There is a difference between language-neutral and language-agnostic. In this case protocol buffers are language neutral because it supports different languages but we have to do some configuration for them. Whereas, language-agnostic means it is totally independent of the programming language.
💭 Understanding the Schema
In Protocol Buffers, we specify a schema that is required for serialization. A schema is like a blueprint for the objects we want to create. All the schemas are usually stored in a *.proto
file.
At a quick glance we can see that in a schema we specify the type, and then the attribute name which is then followed by an integer.
✨ Advantages of Protocol Buffers
Let's quickly go through the basic features/advantages that protocol buffers provide us:
- The schema defines the type for each field. Now, this might seem very trivial at first because even in JSON we also have types (like boolean, string, int, etc). But the difference is that protocol buffers are strongly typed i.e. we cannot put a string in a uint32 field. On the contrary, in JSON we can do that.
- All the logic (code) for serialization and deserialization will be generated by the protocol buffer compiler, we don't have to write them. These codes can be generated in C++, Python, JS, Golang, Dart, and many more languages. So this way we can put our focus on defining the schema and not think about how to serialize the data.
- The protocol buffers are designed in a way that they can support evolution. In other words, if we change the schema, it can still be backward compatible and we won't be breaking the data serialization and deserialization process.
- Another small advantage is the ability to add comments. This might seem very trivial but it's very useful in big codebases and large organizations to document the message and the fields.
- The biggest advantage that protocol buffers provide is that the data is serialized as binary and not as string. This makes a huge difference in terms of the size of the data serialized which is then written onto the disk or shared over the network. We will see test this out later on in this blog.
At the same time, the fact that the serialized data is binary makes debugging a little difficult.
In this article, we will install the protocol buffer compiler and then see some implementation. We will move deep into protocol buffers in future blogs of this series.
🔧 Installing the Protocol Buffer Compiler
The protocol buffer compiler also known as protoc
can be installed by the popular package managers:
Linux
MacOs (using Homebrew)
Others
Visit the release page here and download the zip file for your operating system and computer architecture (protoc-<version>-<os><arch>.zip
).
GitHub release page of protoc binary
protoc
.🛠️ Setting up the project
users.proto
.NOTE: If you are using VSCode do install this extension for better protocol buffer support.
vscode-proto3 extension of VSCode
✍️ Writing the Schema
users.proto
file⚙️ Compiling the proto message
protoc
compiler to generate the code for serialization and deserialisation.users.proto
run the following command in your terminal:users_pb.js
and it has all the code necessary for serializing and deserializing the Users
.Generated users_pb.js file
🪛 Using protobuf structures
users_pb.js
file we have to install a package called google-protobuf
. So to do that, let's initialize our directory as an npm project.index.js
.Schema
has a class called User
. So we can create an instance of User
in this manner:protoc
compiler has already generated some methods for us to set the various params of this object like id, fname, lname.user1
by simply invoking those methods.users.bin
and users.json
. Compare the sizes of the two:Comparison of sizes of Protobuf Serialised data and JSON data
34Bytes
and that of JSON is 81Bytes
. That's like 42% saving of the size.📝 Conclusion
•
•
•
tutorial
developers
handsontutorial
develevate
hotintech