Saturday, August 6, 2011

Zeromq benchmarking with large objects

At work, we're in the process of adding zeromq to our architecture. We plan to use it to make our C++ application multi-process (and eventually multi-computer).

We currently move the data between modules using pointers to Protocol Buffer messages. We know that a change to copying this data around using zeromq will take longer; we want to know how much of a slowdown we will see.

Our data starts out as protobuf messages, so it must be serialized before sending. The protos serialize into std::strings. From there, they are copied into zeromq messages and sent on a zeromq socket. The process is reversed on the receive side. 

Here is an example of our sending/receiving benchmarks on a large piece of data we use. Serialized, it is around 45MB in size.

It takes us 168.92ms to pass this one proto message. For comparison, the non-zeromq method (where a DataManager passes pointers around) takes 2.76ms.

Veteran zeromq users may notice that we're copying the serialized strings into the zeromq messages. These steps are the ones in red (and you thought the chart was colorful just because I like colors). This copying can be avoided in some use cases with zeromq's zero-copy functionality. If we were to take these actions out of our chain, the minimum total time would be 111.85ms

Using these benchmarks, we can decide which data passing paths can be replaced with zeromq, and which are time critical enough to require staying pointers. A ~0.1 second slowdown is a significant amount of time for our processing, but the benefits zeromq provides (multi-process communication!) outweigh the drawbacks in many cases.