19

I am trying to implement a real-time application which involves IPC across different modules. The modules are doing some data intensive processing. I am using message queue as the backbone(Activemq) for IPC in the prototype, which is easy(considering I am a totally IPC newbie), but it's very very slow.

Here is my situation:

  • I have isolated the IPC part so that I could change it other ways in future.
  • I have 3 weeks to implement another faster version. ;-(
  • IPC should be fast, but also comparatively easy to pick up

I have been looking into different IPC approaches: socket, pipe, shared memory. However, I have no experience in IPC, and there is definitely no way I could fail this demo in 3 weeks... Which IPC will be the safe way to start with?

Thanks. Lily

4 Answers 4

34

Best results you'll get with Shared Memory solution.

Recently I met with the same IPC benchmarking. And I think my results will be useful for all who want to compare IPC performance.

Pipe benchmark:

Message size:       128
Message count:      1000000
Total duration:     27367.454 ms
Average duration:   27.319 us
Minimum duration:   5.888 us
Maximum duration:   15763.712 us
Standard deviation: 26.664 us
Message rate:       36539 msg/s

FIFOs (named pipes) benchmark:

Message size:       128
Message count:      1000000
Total duration:     38100.093 ms
Average duration:   38.025 us
Minimum duration:   6.656 us
Maximum duration:   27415.040 us
Standard deviation: 91.614 us
Message rate:       26246 msg/s

Message Queues benchmark:

Message size:       128
Message count:      1000000
Total duration:     14723.159 ms
Average duration:   14.675 us
Minimum duration:   3.840 us
Maximum duration:   17437.184 us
Standard deviation: 53.615 us
Message rate:       67920 msg/s

Shared Memory benchmark:

Message size:       128
Message count:      1000000
Total duration:     261.650 ms
Average duration:   0.238 us
Minimum duration:   0.000 us
Maximum duration:   10092.032 us
Standard deviation: 22.095 us
Message rate:       3821893 msg/s

TCP sockets benchmark:

Message size:       128
Message count:      1000000
Total duration:     44477.257 ms
Average duration:   44.391 us
Minimum duration:   11.520 us
Maximum duration:   15863.296 us
Standard deviation: 44.905 us
Message rate:       22483 msg/s

Unix domain sockets benchmark:

Message size:       128
Message count:      1000000
Total duration:     24579.846 ms
Average duration:   24.531 us
Minimum duration:   2.560 us
Maximum duration:   15932.928 us
Standard deviation: 37.854 us
Message rate:       40683 msg/s

ZeroMQ benchmark:

Message size:       128
Message count:      1000000
Total duration:     64872.327 ms
Average duration:   64.808 us
Minimum duration:   23.552 us
Maximum duration:   16443.392 us
Standard deviation: 133.483 us
Message rate:       15414 msg/s
7

Been facing a similar question myself.

I've found the following pages helpful - IPC performance: Named Pipe vs Socket (in particular) and Sockets vs named pipes for local IPC on Windows?.

It sounds like the concensus is that shared memory is the way to go if you're really concerned about performance, but if the current system you have is a message queue it might be a rather... different structure. A socket and/or named pipe might be easier to implement, and if either meets your specs then you're done there.

3

On Windows, you can use WM_COPYDATA, a special kind of shared memory-based IPC. This is an old, but simple technique: "Process A" sends a message, which contains a pointer to some data in its memory, and waits until "Process B" processes (sorry) the message, e.g. creates a local copy of the data. This method is pretty fast and works on Windows 8 Developer Preview, too (see my benchmark). Any kind of data can be transported this way, by serializing it on the sender, and deserializing it on the receiver side. It's also simple to implement sender and receiver message queues, to make the communication asynchronous.

2
  • according to your benchmark, i just wondered, why win7 has so bad performance. Commented May 22, 2012 at 14:12
  • 1
    because it's a netbook with a relatively slow, single core Atom CPU
    – kol
    Commented May 25, 2012 at 11:52
3

You may check out this blog post https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/

Basically it compares Enduro/X, which is built on POSIX queues ( kernel queues IPC ) versus ZeroMQ, which may deliver messages simultaneously on several different transport classes, incl. tcp:// ( network sockets ), ipc://, inproc://, pgm:// and epgm:// for multicast.

From charts you may see that at some point with larger data packets Enduro/X running on queues wins over the sockets.

Both systems are running good with ~400 000 messages per second, but with 5KB messages, kernel queues are running better.

Source: https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/

(image source: https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/)


UPDATE: Another update as answer to bellow comment, I did rerun test to run ZeroMQ on ipc:// too, see the picture:

Source: https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/

As we see the ZeroMQ ipc:// is better, but again in some range Enduro/X shows the better results and then again ZeroMQ takes over.

Thus I could say that IPC selection depends on the work you plan to do.

Note that ZeroMQ IPC runs on POSIX pipes. While Enduro/x runs on POSIX queues.

5
  • 1
    Let me ask, have you noticed, the cited test/comparison is not using ZeroMQ on the same transport class ( trying to compare a tcp:// to ipc:// )? Would you be able to provide a fair apple-to-apple comparison results where both Enduro/X and ZeroMQ do use IPC? Commented Nov 15, 2016 at 11:35
  • 1
    See above, I have re-tested with ipc://
    – Madars Vi
    Commented Nov 15, 2016 at 13:29
  • +1 for taking care. How would Enduro/X work in scenarios where BLOBs are being processed in distributed system with multiple transport-classes mixed -- tcp:// ( for cluster-distributed SIGs ) + inproc:// ( for fastest / lowest latency in-process message-passing ) + epgm:// ( for final content-streaming )? How does performance scaling work - once adding 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 peers under a given amount of I/O-threads ( the .Context() engine can operate on )? Commented Nov 15, 2016 at 15:43
  • Enduro/X uses tcp for bridging up to 32 cluster server nodes. Where each cluster node does local IPC (as in above chart) between processes. Processes are separate executable copies controlled by application server (thus giving load balance & fault tolerance if any binaries dies). Enduro/X does not do any multi-cast form (except subscribe/publish event paradigm for XATMI services). Thus multi-cast for end devices is needed, then developer needs to create adapter XATMI server or client for doing epgm streaming by it's own. Check the readme here github.com/endurox-dev/endurox
    – Madars Vi
    Commented Nov 15, 2016 at 16:01
  • For doing what user3666197 asks, you might combine Enduro/X as application server, ipc:// and tcp:// transport. And for doing epgm:// you may use ZeroMQ. Both systems supports BLOB processing.
    – Madars Vi
    Commented Nov 15, 2016 at 16:08

Not the answer you're looking for? Browse other questions tagged or ask your own question.