1

I am trying to develop a computing application which needs a lot of memory (>500gb). Buying a single machine for that is overly expensive. I can, though, buy ~100 small instances on Digital Ocean or similar, divide the memory in blocks and use TCP to emulate shared memory between the instances.

Now, my question is: how can I measure/predict the time it will take for two processes in two different machines like that to share information, in comparison to IPC and shared memory? Are there rules of thumb? I don't want exact values, but knowing more or less how much faster one is would be very helpful in visualising the feasibility of this approach.

2 Answers 2

5

This sounds weird, the network IO lag would erase most of the benefit of in memory data accessing on individual machine. I would wonder if your approach would even be faster than buying a single machine with lot of SSDs to run as virtual memory, or even machine running on virtual memory using regular HDD.

Secondly, you have to redesign your logic before you turn your system into a distributed system. You cannot split one big one into smaller ones and then make these smaller ones so tightly coupled via network. The coupling and sharing should happen only on the end results (or special intermediate results that are complete enough to stay on the file system). Not results so intermediate and temporary that they stay (or can stay) in memory.

0
4

First-order rough-order-of-magnitude analysis is the first step.

It takes (nominally) one instruction to access one word of local memory. Assuming it takes 1000 instructions to format the TCP/IP packet and queue it for transmission, and another 1000 instructions at the far end to receive, dequeue, and interpret it, and realizing that an "access" requires two packets to go, one in each direction, you're talking a minimum of 4000 instructions per access.

At that point, your 4 GHz Intel screamer is, on non-local accesses, performing about like a 4 MHz 8086.

That's before you consider network overhead. Assuming 100 physical machines ("instances"), assuming 16-port routers, you need 8 routers: one at the center of a star, connected to the other seven, with each of those connected to about 16 machines. Assuming that non-local accesses are uniformly distributed, each non-local access will take three router hops: one to the local router, one to the star center, one back down to a local router. (Because you assume uniform distribution, only about 1/7 of your accesses will be to a machine sharing your local router. For a first-order analysis, you can ignore that lucky break.) Because you have to go there and back, you're talking 6 router hops and 4000 instructions, NOT INCLUDING ROUTER OPERATIONS, to do one non-local access.

First-order rough-order-of-magnitude analysis would seem to indicate that your application will die of old age waiting on packets in the router queues.

Not the answer you're looking for? Browse other questions tagged or ask your own question.