25

This might sound as a weird question, but in my department we are having trouble with following situation:

We are working here on a server application, which is growing larger and larger, even at the point that we are considering to split it into different parts (DLL files), dynamically loading when needed and unloading afterwards, in order to be able to handle the performance issues.

But: the functions we are using, are passing input and output parameter as STL objects, and as mentioned in a Stack Overflow answer, this is a very bad idea. (The post contains some ±solutions and hacks, but it all does not look very solid.)

Obviously we could replace the input/output parameters by standard C++ types and create STL objects from those once inside the functions, but this might be causing performance drops.

Is it OK to conclude that, in case you are considering to build an application, which might grow that large that one single PC can't handle it anymore, you must not use STL as a technology at all?

More background about this question:
There seem to be some misunderstandings about the question: the issue is the following:
My application is using huge amount of performance (CPU, memory) in order to complete its work, and I would like to split this work into different parts (as the program is already splitted into multiple functions), it's not that difficult to create some DLLs out of my application and put some of the functions in the export table of those DLLs. This would result in following situation:

+-----------+-----------+----
| Machine1  | Machine2  | ...
| App_Inst1 | App_Inst2 | ...
|           |           |    
| DLL1.1    | DLL2.1    | ...
| DLL1.2    | DLL2.2    | ...
| DLL1.x    | DLL2.x    | ...
+-----------+-----------+----

App_Inst1 is the instance of the application, installed on Machine1, while App_Inst2 is the instance of the same application, installed on Machine2.
DLL1.x is a DLL, installed on Machine1, while DLL2.x is a DLL, installed on Machine2.
DLLx.1 covers exported function1.
DLLx.2 covers exported function2.

Now on Machine1 I'd like to execute function1 and function2. I know that this will overload Machine1, so I'd like to send a message to App_Inst2, asking that application instance to perform function2.

The input/output parameters of function1 and function2 are STL (C++ Standard Type Library) objects, and regularly I might expect the customer to do updates of App_Inst1, App_Inst2, DLLx.y (but not all of them, the customer might upgrade Machine1 but not Machine2, or only upgrade the applications but not the DLLs or vice versa, ...). Obviously if the interface (input/output parameters) changes, then the customer is forced to do complete upgrades.

However, as mentioned in the referred StackOverflow URL, a simple re-compilation of App_Inst1 or one of the DLLs might cause the whole system to fall apart, hence my original title of this post, dis-advising the usage of STL (C++ Standard Template Library) for large applications.

I hope that hereby I've cleared out some questions/doubts.

18
  • 44
    Are you sure you are having performance issues because of your executable size? Can you add some details about whether it is realistic to assume all your software is compiled with the same compiler (for example in one go on the build server) or if you actually want to split into independent teams?
    – nvoigt
    Commented May 22, 2018 at 7:57
  • 5
    Basically you need a person whose dedicated job is "build manager" and "release manager", to ensure that all C++ projects are being compiled on the same compiler version and with identical C++ compiler settings, compiled from a consistent snapshot (version) of source code, etc. Typically this is taken care of under the banner of "continuous integration". If you search online you will find lots of articles and tools. Outdated practices can self-reinforce - one outdated practice can lead to all practices being outdated.
    – rwong
    Commented May 22, 2018 at 9:24
  • 8
    The accepted answer in the linked question states that the problem is with C++ calls in general. So "C++ but not STL" doesn't help, you need to go with bare C to be on the safe side (but also see the answers, serialization is likely a better solution).
    – Frax
    Commented May 22, 2018 at 14:03
  • 52
    dynamically loading when needed and unloading afterwards, in order to be able to handle the performance issues What "performance issues"? I don't know of any issues other than using too much memory that can be fixed by unloading things like DLLs from memory - and if that's the problem the easiest fix is to just buy more RAM. Have you profiled your application to identify the actual performance bottlenecks? Because this sounds like an XY problem - you have unspecified "performance issues" and someone has already decided on the solution. Commented May 22, 2018 at 14:20
  • 4
    @MaxBarraclough "The STL" is perfectly well accepted as an alternate name for the templated containers and functions that have been subsumed into the C++ Standard Library. In fact the C++ Core Guidelines, written by Bjarne Stroustrup and Herb Sutter, repeatedly make reference to "the STL" when talking about these. You cannot get a much more authoritative source than that. Commented May 23, 2018 at 9:53

7 Answers 7

113

This is a stone-cold classic X-Y problem.

Your real problem is performance issues. However your question makes it clear that you've done no profiling or other evaluations of where the performance issues actually come from. Instead you're hoping that splitting your code into DLLs will magically solve the problem (which it won't, for the record), and now you're worried about one aspect of that non-solution.

Instead, you need to solve the real problem. If you have multiple executables, check which one is causing the slow-down. While you're at it, make sure it actually is your program taking all the processing time, and not a badly-configured Ethernet driver or something like that. And after that, start profiling the various tasks in your code. The high-precision timer is your friend here. The classic solution is to monitor average and worst-case processing times for a chunk of code.

When you've got data, you can work out how to deal with the problem, and then you can work out where to optimise.

3
  • 55
    "Instead you're hoping that splitting your code into DLLs will magically solve the problem (which it won't, for the record)" -- +1 for this. Your operating system almost certainly implements demand paging which achieves exactly the same result as loading and unloading functionality in DLLs, only automatically rather than requiring manual intervention. Even if you are better at predicting how long a piece of code should hang around once used than the OS virtual memory system is (which is actually unlikely), the OS will cache the DLL file and negate your efforts anyway.
    – Jules
    Commented May 22, 2018 at 20:43
  • @Jules See update - they've clarified that the DLLs exist only on separate machines, so I can maybe see this solution working. There's now communication overhead though, so hard to be sure.
    – Izkata
    Commented May 23, 2018 at 21:57
  • 2
    @Izkata - it's still not entirely clear, but I think what's described is that they want to dynamically select (based on runtime configuration) a version of each function that is either local or remote. But any part of the EXE file that is never used on a given machine simply won't ever be loaded into memory, so the use of DLLs for this purpose is not necessary. Just include both versions of all functions in the standard build, and create a table of function pointers (or C++ callable objects, or whatever method you'd prefer) to invoke the appropriate version of each function.
    – Jules
    Commented May 24, 2018 at 0:43
39

If you have have to split up a software between multiple physical machines, you have to have some form of serialization when passing data between machines as only in some cases can you actually just send the same exact binary between machines. Most serialization methods have no problems handling STL types so that case is not something that would worry me.

If you have to split up an application into Shared Libraries (DLLs) (before doing that for performance reasons, you really should make sure that it actually would solve your performance problems) passing STL objects can be a problem but does not have to be. As the link you provided already describes, passing STL objects works if you use the same compiler and the same compiler settings. If users provide the DLLs, you might not be able to easily count on this. If you provide all the DLLs and compile everything together however then you might be able to count on it and using STL objects across DLL boundaries become very much possible. You do still have to watch out for your compiler settings so that you do not get multiple different heaps if you pass object ownership, though that is not an STL specific problem.

3
  • 1
    Yes, and especially the part about passing allocated objects across DLL/so boundaries. Generally speaking the only way to absolutely avoid the multiple allocator problem is to ensure that the DLL/so (or library!) that allocated the structure also frees it. Which is why you see lots and lots of C-style APIs written this way: an explicit free API for each API passing back an allocated array/struct. The additional problem with STL is that the caller might expect to be able to modify the passed complex data structure (add/remove elements) and that too can't be allowed. But it's hard to enforce.
    – davidbak
    Commented May 22, 2018 at 17:51
  • 1
    If I had to split an application like this, I'd probably use COM, but this generally increases code size as every component brings their own C and C++ libraries (which can be shared when they are the same, but can diverge when necessary, e.g. during transitions. I'm not convinced that this is the appropriate course of action for the OP's problem, though. Commented May 22, 2018 at 18:13
  • 2
    As a specific example, the program is highly likely somewhere to want to send some text to another machine. At some point, there is going to be a pointer to some characters involved in representing that text. You absolutely can't just transmit the bits of those pointers and expect defined behaviour on the receiving side
    – Caleth
    Commented May 23, 2018 at 14:10
20

We are working here on a server application, which is growing larger and larger, even at the point that we are considering to split it into different parts (DLLs), dynamically loading when needed and unloading afterwards, in order to be able to handle the performance issues

RAM is cheap and therefore inactive code is cheap. Loading and unloading code (especially unloading) is a fragile process and is unlikely to have a significant affect on your programs performance on modern desktop/server hardware.

Cache is more expensive but that only affects code that is recently active, not code that is sitting in memory unused.

In general programs outgrow their computers because of data size or CPU time, not code size. If your code size is getting so big that it is causing major problems then you probablly want to look at why that is happening in the first place.

But: the functions we are using, are passing input and output parameter as STL objects, and as mentioned in this StackOverflow URL, this is a very bad idea.

It should be ok as long as the dlls and executable are all built with the same compiler and dynamically linked against the same C++ runtime library. It follows that if the application and it's associated dlls are built and deployed as a single unit then it shouldn't be a problem.

Where it can become a problem is when the libraries are built by different people or can be updated seperately.

Is it ok to conclude that, in case you are considering to build an application, which might grow that large that one single PC can't handle it anymore, you must not use STL as a technology at all?

Not really.

Once you start spreading an application across multiple machines you have a whole load of considerations as to how you pass the data between those machines. The details of whether STL types or more basic types are used is likely to be lost in the noise.

3
  • 2
    Inactive code likely never gets loaded into RAM in the first place. Most operating systems only load pages from executables if they're actually required.
    – Jules
    Commented May 22, 2018 at 20:53
  • 1
    @Jules: If dead code is mixed with live code (with page-size = 4k granularity), then it will be mapped + loaded. Cache works on much finer (64B) granularity, so it's still mostly true that unused functions don't hurt much. Each page needs a TLB entry, though, and (unlike RAM) that is a scarce runtime resource. (File-backed mappings typically don't use hugepages, at least not on Linux; One hugepage is 2MiB on x86-64, so you can cover a lot more code or data without getting any TLB misses with hugepages.) Commented May 24, 2018 at 8:04
  • 1
    What @PeterCordes notes: So, be sure to use “PGO” as part of your build-for-release process!
    – JDługosz
    Commented May 25, 2018 at 23:18
14

No, I don't think that conclusion follows. Even if your program is distributed across multiple machines, there's no reason that using the STL internally forces you to use it in inter-module/process communication.

In fact, I'd argue that you should separate design of external interfaces from internal implementation from the start, as the former will be more solid/hard to change compared to what's used internally

7

You're missing the point of that question.

There are basically two types of DLL's. Your own, and somebody else's. The "STL problem" is that you and them may not be using the same compiler. Obviously, that is not a problem for your own DLL.

5

If you build the DLLs from the same source tree at the same time with the same compiler and build options, then it will work OK.

However the "Windows flavoured" way of splitting an application into multiple pieces some of which are re-usable is COM components. These can be small (individual controls or codecs) or large (IE is available as a COM control, in mshtml.dll).

dynamically loading when needed and unloading afterwards

For a server application, this is probably going to have terrible efficiency; it's only really viable when you have an application that moves through multiple phases over a long period of time so that you know when something isn't going to be needed again. It reminds me of DOS games using the overlay mechanism.

Besides, if your virtual memory system is working properly, it will handle this for you by paging out unused code pages.

might grow that large that one single PC can't handle it anymore

Buy a bigger PC.

Don't forget that with the right optimisation a laptop can outperform a hadoop cluster.

If you really need multiple systems, you have to think very carefully about the boundary between them, since that's where the serialisation cost is. This is where you should start looking at frameworks like MPI.

6
  • 1
    "it's only really viable when you have an application that moves through multiple phases over a long period of time so that you know when something isn't going to be needed again" -- even then it's unlikely to help much, because the OS will cache the DLL files, which will likely end up taking more memory than just including the functions directly in your base executable. Overlays are only useful in systems without virtual memory, or when virtual address space is the limiting factor (I presume this application is 64-bit, not 32...).
    – Jules
    Commented May 22, 2018 at 20:46
  • 3
    "Buy a bigger PC" +1. You can now acquire systems with multiple terabytes of RAM. You can hire one from Amazon for less than the hourly rate of a single developer. How much developer time are you going to spend optimizing your code to reduce memory usage?
    – Jules
    Commented May 22, 2018 at 20:48
  • 2
    The biggest problem I've faced with "buy a bigger PC" was related to the question "how far will your app scale?". My answer was "how much are you willing to spend on a test? Because I expect it to scale so far that renting a proper machine and setting up a properly large test is going to costs thousands of dollars. None of our customers is even close to what a single-CPU PC can do.". Many older programmers have no realistic idea how much PC's have grown up; the videocard alone in modern PC's is a supercomputer by 20th century standards.
    – MSalters
    Commented May 22, 2018 at 22:00
  • COM components? In the 1990s maybe, but now? Commented May 22, 2018 at 23:32
  • @MSalters - right... anyone with any questions about how far an application can scale on a single PC should look at the specs for the Amazon EC2 x1e.32xlarge instance type - 72 physical processor cores total in the machine providing 128 virtual cores at 2.3GHz (burstable to 3.1GHz), potentially as much as 340GB/s memory bandwidth (depending on what kind of memory is installed, which isn't described in the spec), and 3.9TiB of RAM. It has enough cache to run most applications without ever touching the main RAM. Even without a GPU it's as powerful as a 500-node supercomputer cluster from 2000.
    – Jules
    Commented May 24, 2018 at 1:12
0

We are working here on a server application, which is growing larger and larger, even at the point that we are considering to split it into different parts (DLL files), dynamically loading when needed and unloading afterwards, in order to be able to handle the performance issues.

The first part makes sense (splitting application to different machines, for performance reasons).

The second part (loading and unloading libraries) does not make sense, as this is extra effort to make, and it will not (really) improve things.

The problem you are describing is better solved with dedicated computation machines, but these will should not be working with the same (main) application.

The classic solution looks like this:

[user] [front-end] [machine1] [common resources]
                   [machine2]
                   [machine3]

Between the front-end and computation machines, you may have extra things, such as load balancers and performance monitoring, and keeing specialized processing on dedicated machines is good for caching and throughput optimizations.

This in no way implies extra loading/unloading of DLLs, nor anything to do with the STL.

That is, use STL internally as required, and serialize your data between the elements (see grpc and protocol buffers and the kind of problems they solve).

This said, with the limited information you provided, this does look like the classic x-y problem (as @Graham said).

Not the answer you're looking for? Browse other questions tagged or ask your own question.