Golang Protobuf
Overview
We'll be looking at how to use the Protocol Buffers data format in your Go-based apps in this article. We'll discuss the data format and why it's better than more conventional data formats like XML or even JSON. And to get us started, let's look at a basic example before trying a more difficult one.
Introduction
In terms of data formats, golang protobuf is similar to JSON or XML in that they store structured data that can be serialized or deserialized by a variety of different languages.
The key benefit of this format is that it is far smaller than formats like XML or even JSON. It is so large that every byte they can save makes a difference, Google created the format in its original form.
Imagine if we wished to represent a person's data in each of the following three different data formats:
XML
JSON We could use JSON to describe this data with a much reduced size footprint:
Protocol Buffer And if the protocol buffer data format were used to express the same data:
If you carefully examine the above wire encoded output, you will notice that the name Jainey is written out starting from position 2 in the array, with e = 69, l = 108, and so on. The byte representation of age, 33, is displayed after this.
There is more to the encoding format than first appears. If you're interested, you can view Google's own material on Protocol Buffer Encoding here. While the size of the JSON representation and the Protocol Buffer representation may be comparable at this scale, the savings begin to increase when we start to think about scenarios where our data is greater than your typical "getting started" example.
Why use Protocol Buffers?
We'll use a very basic "address book" application as our example, which can read from and write to a file of people's contact information. The address book contains information about each contact, including their name, ID, email address, and phone number.
How is this kind of structured data serialised and retrieved? There are several solutions to this issue: To serialize Go data structures, use gobs. This works well in a Go-specific context, but it is less effective when data sharing with programmes created on other platforms is required.
The data components can be encoded arbitrarily into a single string, for example, 4 ints could be encoded as 12:3:-23:67. Although it necessitates creating one-off encoding and parsing code, and the parsing imposes a tiny run-time cost, this is a straightforward and adaptable solution. Simple data is best encoded using this method. XML serialization of the data. Since XML is (sort of) human readable and there exist binding libraries for many languages, this method may be quite alluring. If you wish to share data with other projects, this can be an excellent option. However, because XML is so notoriously space-hungry, applications that use it can suffer greatly when encoding or decoding it. Additionally, browsing an XML DOM tree is far more difficult than it would typically be for simple fields in a class.
The adaptable, effective, and automated method to address this issue is protocol buffers. You create a.proto description of the data structure you want to store using protocol buffers. A class that enables automatic encoding and parsing of the protocol buffer data with an effective binary format is then created by the protocol buffer compiler using this information. The produced class handles the specifics of reading and writing the protocol buffer as a whole and offers getters and setters for the fields that comprise a protocol buffer. The protocol buffer format, which is significant, permits the notion of evolving the format over time while still allowing the code to read data encoded with the older format.
JSON vs Protobuf
Built-in data formats, quick serialization/deserialization, versioning, and backward compatibility are all features of Protocol Buffers. JSON's advantage may just be its greater popularity and the absence of Protocol Buffers in some languages.
Defining Protocol Format
You must begin by creating a .proto file before you can develop your address book application. A .proto file's definitions are straightforward: simply add a message for each data structure you wish to serialize, then define a name and a type for each field. Addressbook.proto is the .proto file used in our sample that specifies the messages.
A package declaration at the beginning of the .proto file helps to avoid naming conflicts between several projects.
The import path of the package that will contain all the generated code for this file is defined by the go package parameter. The final path element of the import route will be the Go package name. For example, we'll use the package name "tutorialpb" in our example.
Your message definitions come next. Simply put, a message is an aggregate with a number of typed fields. Several common simple data types, such as bool, int32, float, double, and string, are available as field types. Other message kinds can be used as field types to better arrange your messages.
In the above example, the AddressBook message contains Person messages whereas the Person message contains PhoneNumber messages. As you can see, the PhoneNumber type is declared inside of Person, demonstrating how message types can be defined within other messages. In this case, you want to declare that a phone number can be one of MOBILE, HOME, or WORK. You can also construct enum types if you want one of your fields to have of a preset list of values.
Each element's = 1 and = 2 markers provide the specific "tag" that field uses in the binary encoding. As an optimization, you can choose to use tags 1 through 15 for items that are frequently used or repeated while saving tags 16 and above for optional elements that are less frequently used. This is because tags 1 through 15 take one fewer byte to encode than higher numbers. Repeated fields are an excellent candidate for this optimization since each element in a repeated field requires re-encoding the tag number.
If a field value is left blank, the default value is used: false for bools, empty string for strings, and zero for numeric types. The "default instance" or "prototype" of the message, which has none of its fields specified, is always the default value for embedded messages. The default value of a field is always returned when the accessor for that field is used to retrieve its value. If a field is repeated, it may do so as many times as necessary (including zero). The protocol buffer will maintain the repeated values' order. Repeated fields can be compared to dynamically sized arrays.
The Protocol Buffer Language Guide is a thorough explanation of how to write .proto files, including a list of all the potential field kinds. However, if you're seeking features like class inheritance, protocol buffers don't do that.
Compiling Protocol Buffers
The next step is to create the classes you'll need to read and write AddressBook (and consequently Person and PhoneNumber) messages now that you have a .proto. Run the protocol buffer compiler protoc on your .proto to accomplish this. Download the package and follow the directions in the README if you haven't already installed the compiler. You may download the golang protobuf plugin by typing the following command.
Protoc-gen-go, a compiler plugin, will be set up by default in GOPATH/bin. For the protocol compiler protoc to find it, it needs to be in your SRC DIR). Invoking in this instance would be:
You use the —go out an option because you want Go code; equivalent options are available for other supported languages. This creates the file addressbook.pb at
Protocol Buffer API
When you create addressbook.pb.go, you get the following helpful types:
- A framework for an address book containing a People field.
- A Person structure having Name, ID, Email, and Phones fields.
- The structure called Person PhoneNumber with fields for Number and Type.
- For each value in the Person, the type Person PhoneType and a value are defined.
- Enum for PhoneType.
The Go Generated Code guide has additional information on the specifics of what is generated, but for the most part, you can regard these as entirely regular Go types.
Here is an example of how you could make an instance of Person taken from the unit tests for the list people command:
Write a Message Using Protobuf
Utilizing protocol buffers is essential for serializing your data so that it may be processed elsewhere. To serialise the data in your protocol buffer in Go, use the Marshal function in the proto library. The proto.The message interface is implemented by a pointer to a message's struct in a protocol buffer. Invoking proto.Marshal provides the protocol buffer in wire format, that is encoded. As an example, the add person command uses this function:
Read a Message Using Protobuf
Use the Unmarshal function of the proto library to decode an encoded message. By calling this, the data in the input is parsed as a protocol buffer, and the outcome is stored in book. Hence, we use: to parse the file in the list people command.
Extending a Protocol Buffer
You'll probably want to "update" the specification of the protocol buffer at some point after releasing the code that utilises it. There are several guidelines you must adhere to if you want your new buffers to be backward-compatible and your old buffers to be forward-compatible, which is something you almost definitely require. Using the updated protocol buffer:
- Any current fields' tag numbers cannot be changed.
- Fields may be deleted.
- While adding new fields, you must use brand-new tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).
These guidelines will ensure that old code will gladly read new messages and avoid any new fields. The deleted repeated fields will be empty for the old code, whereas deleted singular fields will simply have their default value. Old messages will also be transparently read by new code.
However, keep in mind that new fields won't be available in older messages, so you'll need to come up with a logical default setting. The default value is type-specific, being the empty string in the case of strings. The default setting for booleans is false. The default value for numeric types is zero.
Conclusion
- golang protobuf are similar to JSON or XML in that they store structured data that can be serialised or deserialized by a variety of different languages.
- The key benefit of this format is that it is far smaller than formats like XML or even JSON. It is so large that every byte they can save makes a difference, Google created the format in its original form.
- Built-in data formats, quick serialization/deserialization, versioning, and backward compatibility are all features of Protocol Buffers.
- A .proto file's definitions are straightforward: simply add a message for each data structure you wish to serialise, then define a name and a type for each field.