How can you predict the time it will take for two processes in two different machines in a cluster to communicate?

Question

I am trying to develop a computing application which needs a lot of memory (>500gb). Buying a single machine for that is overly expensive. I can, though, buy ~100 small instances on Digital Ocean or similar, divide the memory in blocks and use TCP to emulate shared memory between the instances.

Now, my question is: how can I measure/predict the time it will take for two processes in two different machines like that to share information, in comparison to IPC and shared memory? Are there rules of thumb? I don't want exact values, but knowing more or less how much faster one is would be very helpful in visualising the feasibility of this approach.

InformedA · Accepted Answer · 2014-06-08 05:17:33Z

This sounds weird, the network IO lag would erase most of the benefit of in memory data accessing on individual machine. I would wonder if your approach would even be faster than buying a single machine with lot of SSDs to run as virtual memory, or even machine running on virtual memory using regular HDD.

Secondly, you have to redesign your logic before you turn your system into a distributed system. You cannot split one big one into smaller ones and then make these smaller ones so tightly coupled via network. The coupling and sharing should happen only on the end results (or special intermediate results that are complete enough to stay on the file system). Not results so intermediate and temporary that they stay (or can stay) in memory.

John R. Strohm · Accepted Answer · 2014-06-08 14:30:50Z

First-order rough-order-of-magnitude analysis is the first step.

It takes (nominally) one instruction to access one word of local memory. Assuming it takes 1000 instructions to format the TCP/IP packet and queue it for transmission, and another 1000 instructions at the far end to receive, dequeue, and interpret it, and realizing that an "access" requires two packets to go, one in each direction, you're talking a minimum of 4000 instructions per access.

At that point, your 4 GHz Intel screamer is, on non-local accesses, performing about like a 4 MHz 8086.

That's before you consider network overhead. Assuming 100 physical machines ("instances"), assuming 16-port routers, you need 8 routers: one at the center of a star, connected to the other seven, with each of those connected to about 16 machines. Assuming that non-local accesses are uniformly distributed, each non-local access will take three router hops: one to the local router, one to the star center, one back down to a local router. (Because you assume uniform distribution, only about 1/7 of your accesses will be to a machine sharing your local router. For a first-order analysis, you can ignore that lucky break.) Because you have to go there and back, you're talking 6 router hops and 4000 instructions, NOT INCLUDING ROUTER OPERATIONS, to do one non-local access.

First-order rough-order-of-magnitude analysis would seem to indicate that your application will die of old age waiting on packets in the router queues.

Stack Exchange Network

How can you predict the time it will take for two processes in two different machines in a cluster to communicate?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
architecture
database
development-process
database-design
or ask your own question.

Hot Network Questions

How can you predict the time it will take for two processes in two different machines in a cluster to communicate?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged architecturedatabasedevelopment-processdatabase-design or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
architecture
database
development-process
database-design
or ask your own question.