Writing your first message in Protocol Buffers

We will look into how to write messages in Protocol Buffers. We will learn about the various types that are available to us.

2 December, 2022

Contributors

Arnab Sen

@arnabsen1729

In this blog, we will write our first message in Protocol Buffers. We will learn about the various types that we can use and discuss the various points to keep in mind while writing our protobuf schema.

💭 Let's Recall

Let's recall what we learned so far about Protocol Buffers:

We first define the protobuf schema of the data which we want to serialize/deserialize. Then we use the protoc compiler to generate the code in the corresponding languages like Python, Java, Go .etc. We will use the functions in the generated code to serialize our data to raw binary. We can then use the deserialize functions to reconstruct the data from the binary.

Here is a flow diagram explaining this:

Most common usage for Protobuf is communication and the framework it is most used for is gRPC.

Source: https://grpc.io/docs/what-is-grpc/introduction/

✍️ Declaring a message

There are two versions of proto, proto2 and proto3 . So we need to specify which proto version we are using in the first line of the proto file.

Now, to define our data we create a message. You can think of a message like a class.

Inside the message, we specify the various fields. It follows the format <type> <variable-name> = <field-numbers>;

Note: If a field value is not set during the creating of an object they will either be kept empty or have a default value after serialisation.

🧰 Type support in proto3

In the first part of declaring the field, we specify the type of the field. The type can be:

Number

For numbers we can use the following types of values:

•

Signed Representation : int32, int64, sint32, sint64.

•

Unsigned Representation : uint32, uint64.

•

Fixed Representation : fixed32, fixed64, sfixed32, sfixed64.

•

Decimal Representation : float, double.

Both int32 and int64 uses variable-length encoding i.e they might have different lengths in the serialized output.

Whereas, for fixed32, fixed64 the length is fixed.

If you want more precision with negative numbers it's better to prefer sint32, sint64 than int's.

The default value for all the number values is 0.

Boolean

We define boolean fields with the type bool. It can have values true or false.

The default value for a boolean field is false.

String

The keyword is string. It can have arbitrary length and the default value is an empty string.

Note: A string must always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 232

Bytes

We use this field byte, when we have raw data inside the message like an image. Just like string, in the case of bytes, the default value is empty bytes.

🔢 Field Numbers

Every time we define a field in a proto message we specify a number at the end. They are called field numbers. For every message, a field can have one unique field number. It is this field number that is used in the serialization process. The field names don't matter.

Note: Once you have used a message a particular field number you cannot change the field numbers. This will result in the old data getting invalidated. For example, if for the field name you used the number 2 but later on changed it to the number 3, the new deserialiser won't be able to deserialise the old data.

The smallest value of this field number can be 1. The field numbers in the range 1 through 15 take one byte to encode. Field numbers in the range 16 through 2047 take two bytes. So reserve the 1-15 numbers for important fields that won't be changing and will be frequently occurring.

🔁 Repeated Fields

Not always the value of a field in our schema can have just one value. Sometimes we can have a list of items. So we use the repeated keyword to specify that.

Let's say we want to add a field called hobbies in our User message. Then we can do that by:

In the serialized output the order of the values will be preserved and by default, the value of a repeated field will be an empty list.

Enunmerations (or Enums)

Some fields can have some predefined values like a user can have a favorite color which can be RED, GREEN, BLUE , etc.

To define enums in the protobuf schema we use the keyword enum.

A few things to note about enums are:

•

The first tag number of the enum has to be 0, unlike the field numbers which we saw can have a minimum value of 1.

•

The default value of the enum is the first value (zero value).

Nested Types

We can also nest messages inside messages. Take a look at this schema for example:

Conclusion

So, we learned about the various field types, how to number the fields, and then create nested messages.

In the next blogs, we will dive deep into how to integrate protocol messages with your programming language and create a service to send and receive serialized data.

Feel free to follow me on Showwcase and other platforms. You can find the links here: arnabsen.dev/links

tutorials

tutorial

handsontutorial

develevate