14

Is there a way to get the maximal size of a certain protobuf message after it will be serialized?

I'm referring to messages that don't contain "repeated" elements.

Note that I'm not referring to the size of a protobuf message with a specific content, but to the maximum possible size that it can get to (in the worst case).

3 Answers 3

33

In general, any Protobuf message can be any length due to the possibility of unknown fields.

If you are receiving a message, you cannot make any assumptions about the length.

If you are sending a message that you built yourself, then you can perhaps assume that it only contains fields you know about -- but then again, you can also easily compute the exact message size in this case.

Thus it's usually not useful to ask what the maximum size is.

With that said, you could write code that uses the Descriptor interfaces to iterate over the FieldDescriptors for a message type (MyMessageType::descriptor()).

See: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor

Similar interfaces exist in Java, Python, and probably others.

Here's the rules to implement:

Each field is composed of a tag followed by some data.

For the tag:

  • Field numbers 1-15 have a 1-byte tag.
  • Field numbers 16 and up have 2-byte tags.

For the data:

  • bool is always one byte.
  • int32, int64, uint64, and sint64 have a maximum data length of 10 bytes (yes, int32 can be 10 bytes if it is negative, unfortunately).
  • sint32 and uint32 have a maximum data length of 5 bytes.
  • fixed32, sfixed32, and float are always exactly 4 bytes.
  • fixed64, sfixed64, and double are always exactly 8 bytes.
  • Enum-typed fields' maximum length depends on the maximum enum value:
    • 0-127: 1 byte
    • 128-16384: 2 bytes
    • ... it's 7 bits per byte, but hopefully your enum isn't THAT big!
    • Also note that negative values will be encoded as 10 bytes, but hopefully there aren't any.
  • Message-typed fields' maximum length is the maximum length of the message type plus bytes for the length prefix. The length prefix is, again, one byte per 7 bits of integer data.
  • Groups (which you shouldn't be using; they're a decrepit old feature deprecated before protobuf was even released publicly) have a maximum size equal to the maximum size of the contents plus a second field tag (see above).

If your message contains any of the following, then its maximum length is unbounded:

  • Any field of type string or bytes. (Unless you know their max length, in which case, it's that max length plus a length prefix, like with sub-messages.)
  • Any repeated field. (Unless you know its max length, in which case, each element of the list has a max length as if it were a free-standing field, including tag. There is NO overall length prefix here. Unless you are using [packed=true], in which case you'll have to look up the details.)
  • Extensions.
2
  • 1
    Are you sure int32 can take up to 10 bytes if negative ? AFAIK any int32 encoded using varint can be encoded using maximum 5 bytes.
    – tigrou
    Commented Sep 15, 2016 at 14:30
  • 9
    @tigrou Yes I'm sure, since I wrote the code. :) Negative int32s have to be padded to 10 bytes because int32s are expected to be forwards-compatible with int64s, so that you can change an existing int32 field to int64 in the future if you need to. Commented Sep 15, 2016 at 20:24
5

As far as I know, there is no feature to calculate the maximum size in Google's own protobuf.

Nanopb generator computes the maximum size when possible and exports it as a #define in the generated file.

It is also quite simple to calculate manually for small messages, based on the protobuf encoding documentation.

1
  • Awesome! Would have been great if there was something that can do this in runtime, but I guess compile time would do... Gonna try that Python script next week and see if does the job. Thanks!
    – traveh
    Commented Jun 19, 2015 at 14:15
3

While implementing protobuffer 3 message size calculation, I found that most of what Kenton said is true. I did run into one oversight though: Tags are created from the field number, which is left-shifted 3 bits, then bit-wise ORed with the wire type (found in wire_format_lite.h). That result is then encoded as a var int. So for Tags that are just over 16, the tag will be 2 bytes, but if the field number is larger (>~1000) then the tag will be larger than 3 bytes. This probably isn't a problem for protobuffer 3 users, since having a field number that large is a misuse of protobuf.

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.