Memory efficient generation and selection of tuples

Question

The short of this question: I need to find a bunch of n-tuples of {0,1} which do not fail to satisfy a set of (non-linear polynomial) equations, without hogging up all of my memory by trying to generate every possible n-tuple before eliminating the bad ones. This set of 'not bad' n-tuples should be substantially smaller than every possible n-tuple for the problem I am working on.

The long of this question:

I have a system of non-linear (polynomial) equations I want to solve; indeed, multiple such systems, each one corresponding to a separate case of interest to a larger problem. Simply using Mathematica's built-in Solve function isn't very effective, as it quickly consumes memory and then run's at a snail's pace trying to work its way through the problem without consuming extra memory. Thankfully, the systems I'm working with have a couple of nice properties which give a more efficient workaround (indeed, my current method converts what could have taken days with Solve into mere minutes).

The one of relevance here is that, if I call the variables in the system d[i], then it turns out there are a fair number of equations of the form d[i]^2==d[i], which means all of those variables have to be either 0's or 1's. I am currently using Tuples to generate every possible substitution, and then proceeding to eliminate all of those which cannot satisfy the equations. However, this gets extremely memory intensive pretty fast. Even 'small examples' lead to insufficient memory errors at this stage, as they can have up to a few dozen of these squaring relations in each list of equations (and there can be half-dozen or more such lists in a 'small' example). There may be a number of ways to improve the efficiency of my code, as I'm still fairly new to advanced Mathematica programming, but mostly I am looking for ways to keep the memory demands down without drastically inflating computation time.

Before getting to this stage, I generate a nested list of equations I'm interested in, denoted CubeReduced2 (the equations end up being degree 2 multinomials; the reason for the 'cube' is not particularly germane here). The equations are all stored on the final level; the preceding levels effectively serve as markers of which case I'm looking at it, and do not store any other data. I am attempting to solve many possible systems without having to do them individually, especially when I have no advanced knowledge of how many systems there will be.

This is the code I am currently using to generate all possible combinations of 0/1 substitutions for these variables

SelfSquares = Flatten[Map[Reap[Do[If[MemberQ[#, d[i]^2 == d[i], Sow[i]], 
{i, 1, varnum}]][[2]]&, CubeReduced2, {2}], {{1}, {2}, {3, 4}}];
SquarePossibilities = 
 ParallelTable[
  Tuples[{0, 1}, Length[SelfSquares[[m]][[i]]]], {m, 1, 
   Length[CubeReduced2]}, {i, 1, Length[CubeReduced2[[m]]]}];
SquareReplaces = 
 ParallelTable[
  Map[Table[
     d[SelfSquares[[m]][[i]][[j]]]] -> #[[j]], {j, 1, 
      Length[SelfSquares[[m]][[i]]]}] &, 
   SquarePossibilities[[m]][[i]]], {m, 1, Length[SquarePossibilities]}, {i, 1, 
   Length[SquarePossibilities[[m]]]}];

The SquarePossibilities stage is where my system memory quickly vanishes. I Remove it after generating SquareReplaces, but when I can't even build the list that doesn't help any.

I would then use the following code to eliminate the tuples which definitely won't work:

SquareReplaces2 = 
  ParallelTable[
   Select[SquareReplaces[[m]][[
     i]], (And @@ 
       Map[! TrueQ[# == False] &, ((CubeReduced2[[m]][[
            i]]) /. (#))]) &], {m, 1, Length[CubeReduced2]}, {i, 1, 
    Length[CubeReduced2[[m]]]}];

There may be non-trivial equations left after apply the replacements, which will still need to be solved for. That's why I don't simply look for every equation to evaluate to True, but simply look for those tuples which don't produce a False. Hopefully this explanation is sufficient at this point to describe what I'm trying to accomplish.

I'd provide a sample list of equations I have applied this code to (succesfully), but right now my mathematica has the variables as $d_i$, which don't translate very well into an easy copy-paste. I'll try to fix that if a relevant example is desired.

Would it help if you split your Tuples[{1,0},n] into two Tuples[{1,0},n/2] lists and work through each combination of the two lists, joining elements as you go? — Sjoerd C. de Vries, Commented Jul 7, 2013 at 8:26
Refer the algorithm to find subsets using "lattice of subsets" which actually starts for small number of promising subsets and than using them together to generate bigger subsets automatically ignoring irrelevant subsets. — Pankaj Sejwal, Commented Jul 7, 2013 at 9:11
How large is n typically in your cases? I mean with n=30 you have 2^30 tuples which is a number too large to do even the simplest loop Do[Null, {i, 2^30}];. — halirutan, Commented Jul 7, 2013 at 13:01
@halirutan n=30 is fairly easily achieved in 'small' examples. A particular example I've tried to compute that ran into a memory error had about n=24 to n=28 on each of 5 sets of equations. — Zibadawa, Commented Jul 7, 2013 at 16:41
@Zibadawa Preferably, one should be able to copy and paste code blocks into Mathematica and test out the code. TeX code is not functional when pasted into Mma. For big or long problems, a simplified example that illustrates the problem is acceptable (and preferred). You can also use d[i] as your variable and have it formatted as a subscript. See this and perhaps point 4 in this. — Michael E2, Commented Jul 7, 2013 at 20:48

halirutan · Accepted Answer · 2013-07-08 02:37:17Z

Your explanations are very detailed and I have to admit it's too detailed for me to go through it, trying to understand without a minimal working example. Therefore, view this as some ideas for your first, short-version question. I will present two different approaches, where the first one takes very long, but consumes almost no memory and the second one is very fast, but uses much memory during the computation. Finally, I'll show you how you can combine them so that they work with your memory restrictions.

When I understood you correctly, then you have a function which tells you whether or not a combination {0,1,0,0,1,1,..} is relevant. I will call this function selector and what I use for the demonstration here is very simple: A combination is relevant if it contains exactly two 1's. A simple definition of this selector is therefore

selector[combination_] := Plus @@ combination === 2;

Whether or not my idea works for your problem depends partially on your selector. Here the question is for instance, can we include your selector in compiled code or how fast it is.

Another issue you'll see here is, that I don't make use of ParallelTable or related parallel constructs. The reason is simple: Since those functions distribute your problem to many subkernels they need to distribute parts of your data as well. If you are not absolutely sure that you know what you do there it can happen very easily that this consumes much memory while the speed up is often non-existent. In fact, I will make use of another kind of parallelization in my second approach.

General idea

The general idea bases on the fact that it is quite easy (and memory efficient) to create all tuples of {0,1} if you remember that all numbers are stored binary inside your computer. Therefore, to create all combinations of length 4 you only need to count from 0 to 15 and look at the binary representation. Mathematica has the IntegerDigits function to give you the representation of a number in a different base. Therefore an equivalent algorithm to Tuples[{0, 1}, 4] is the following

IntegerDigits[Range[0,15],2,4]
(*
{{0,0,0,0},{0,0,0,1},{0,0,1,0},{0,0,1,1},
 {0,1,0,0},{0,1,0,1},{0,1,1,0},{0,1,1,1},
 {1,0,0,0},{1,0,0,1},{1,0,1,0},{1,0,1,1},
 {1,1,0,0},{1,1,0,1},{1,1,1,0},{1,1,1,1}}
*)

The idea has two advantages. Firstly, you can just go through all combinations very easily by counting. Secondly, you don't need to store the {0,1,..} combinations explicitly because you can have them very memory efficient in a list of integers.

1. Approach: Slow but memory efficient

With the above idea the most direct algorithm to test all n-tuples of {0,1} is to go through all numbers 0 to 2^n-1 and test each binary representation whether for its relevance. If it is relevant then you store it. With this you consume (more or less) only the memory which your final result list would require anyway. Again, remember that you don't need to store a list of combinations as result because a list of integers representing the combinations is sufficient.

To try this we define a function iterativeSelect which implements the idea. It further sets a progess variable so that we have a feeling how long it will take. To store a relevant combination I Sow the integer representing the combination. All result combinations are collected using Reap at the outside. Reaping and sowing is a good way to collect results memory efficient and I strongly advice you not to use something like AppendTo.

iterativeSelect[n_, test_] := Last@Last@Reap[Do[
  If[Mod[i, 2^10] === 0, progress = i];
  If[test[IntegerDigits[i, 2, n]],
    Sow[i]
  ], {i, 0, 2^n - 1}
 ]
]

With selector defined as in the beginning of this answer, let's test this approach for n=25

n = 25;
ProgressIndicator[Dynamic[progress], {0, 2^n - 1}]
m1 = MemoryInUse[]/2^20.;
result = iterativeSelect[n, selector]; // AbsoluteTiming
m2 = MemoryInUse[]/2^20.;
m2 - m1

This takes about 156 seconds here and consumes almost no memory (~5kB). The equivalent run with Tuples

Select[Tuples[{0, 1}, 25], selector] // Length

occupies about 17GB of memory during the run (but frees it when it is finished) and takes maybe 1.5 minutes.

If we want to make an extrapolation how long this method would run for n=30 we could remember that

$$2^{30} = 32\cdot2^{25}$$

so I guess something around $156\cdot32$ seconds which is about 80 minutes.

2. Approach: Fast but memory consuming

The second approach I'm suggesting is to make a compiled function which takes a list of integers representing the combinations and tests the whole list in parallel. What you get is a list of True|False which you can use with Pick to select only relevant combinations.

This clearly is memory consuming because we need the integer list completely during the run. For n=30 this requires about 8GB

ByteCount[Range[2^30]]/2^20.

Remember, that we need a second list of True|False which is the result of the parallel selector. Furthermore, we don't now what Pick does internally.

Another restriction is that you have to be able to compile your selector function completely, otherwise it will not be of much use. Long story short, in the next code block the options to Compile are significant.

parallelSelector = Compile[{{n, _Integer}},
  Plus @@ IntegerDigits[n, 2] === 2,
  CompilationTarget -> "C",
  Parallelization -> True,
  RuntimeAttributes -> {Listable},
  RuntimeOptions -> "Speed"];

What you should always do is to use CompilePrint from the <<CompiledFunctionsTools` package the check whether your compiled function is free of any MainEvaluate call.

A speed test of the parallelSelector function only reveals how fast it is compared to the first approach

parallelSelector[Range[0, 2^25]]; // AbsoluteTiming
(* 3.094913 seconds *)

To create now all relevant combinations, we create the True|False list and select the True entries using Pick

result = Block[{data = Range[0, 2^25]},
  Pick[data, parallelSelector[data]]
]; // AbsoluteTiming

This takes about 5 seconds here and with 32GB of RAM I didn't see any noticeable memory consumption during the run, but a list of 2^25 integers needs only 256MB of memory. If I do the same calculation for n=28 on my machine it finishes very fast within 40 seconds but it needs a peak memory of about 11 GB during the run

enter image description here

Combining the two approaches

Combining the two approaches is now easy. Assuming that on your machine has enough memory to run parallelSelector for n=25 easily and you require to calculate n=30. Then you make a sequential loop that iterates to 2^30 in a stepsize of 2^25. In each step i (of the 32 steps) you calculate the interval [i,i+2^25-1] in parallel. This is kind of similar to what Sjoerd suggested in his first comment. We can even approximate the runtime quite exact. Since one parallel run for n=25 takes about 5 seconds on my machine and we need 32 steps to iterate to 2^30 I expect about 160 seconds. And indeed

sequencialParallelIterate[n_, step_, parallelTest_] :=
  Flatten@Last@Last@Reap@Do[
    Sow@Block[{data = Range[i, i + 2^step - 1]},
      Pick[data, parallelTest[data]]
    ], {i, 0, 2^n - 1, 2^step}
 ]

sequencialParallelIterate[30, 25, parallelSelector]; // AbsoluteTiming

needs 166 seconds here. If I'm allowed to refer to my system monitor once more:

enter image description here

First row shows the cpu usage and the second one again the memory in use. You see that the memory consumption is constantly low because it only needs what is used in a parallel run. When you correlate this with the cpu usage you see that the memory peaks are where the parallel selector runs. Additionally, you clearly see every single iteration because then all my core work very hard.

Conclusion

The benefit of the presented method is that you can tune it easily to your memory/runtime needs. If you are low on memory, you calculate smaller chunks in parallel and iterate more often. If you have enough memory you can make the chunks who are computed with a low-level parallelization larger and you gain speed.

The fast parallel method strongly depends that you don't do very complex things in your selector. If your selector depends on global variables they can most probably be included in the compiled function (using With or another strategy) but if you need something non-trivial like a call to Solve things get complicated if not impossible.

Using binary representations of integers in place of the tuples is very clever. My selector is actually present in the post, but I'll give a more reduced example here: And@@Map[! TrueQ[# == False] &, CubeReduced2 /.#]& Here CubeReduced2 is a list of degree 2 multinomials in the unknowns d[i], and I am attempting to plug in values through substitution rules (if there's a better way, I am unaware). So equations like d[10]^2+d[2]d[3]+d[4]d[9]/4 - 3d[6]d[15]/2 == 1. The coefficients on the unknowns are complex numbers in general, but usually they end up simplifying to rationals. — Zibadawa, Commented Jul 8, 2013 at 3:27
I'm not sure I can get Compile to work in my case. Unless you can tell me how to get it to deal with needing varying lists of rules like {d[3]->0, d[5]->0, d[11]->1} on one pass, and {d[2]->1, d[3]->0, d[12]->0, d[21]->1} on yet another pass (etc.). Not all of the variables will be covered by a tuple. Seems like the function would have to be re-compiled with a With statement every time, which doesn't sound terribly efficient. The list of equations I'm pretty sure I can squeeze into with With, but the highly-varying substitution rules is something I can't figure out. — Zibadawa, Commented Jul 8, 2013 at 6:30
@Zibadawa Btw, I haven't forgot your problem and I'm still thinking about it but I haven't had time to sit down and try something. Are in joining the chat from time to time, so that we might discuss this in person? — halirutan, Commented Jul 9, 2013 at 6:59
Huh, there's a chat on the stack exchange? I didn't know that. I'll open up the mathematica one, but I'm about to go to sleep, so I probably won't be around for another 8+ hours. If you are still thinking about it, then maybe keep in mind that the final implementation of the code would have multiple sets of equations (the size and contents, and total number of sets, of which are not known ahead of time) it would want to inspect tuples for (individually), rather than a single set (which is what my own answer is coded for; but that one's easily extended to the general case). — Zibadawa, Commented Jul 9, 2013 at 8:39

Zibadawa · Accepted Answer · 2013-07-08 07:49:52Z

Here's the solution I came up with along the lines of running over tuples of a smaller set instead of the entire set, and building up the answer from there.

varnum is the maximum possible number of variables d[i], and CubeReduced2 is the set of equations under consideration.

I'm not sure how to best parallelize the code, so I'll use a sequential code. I've adjusted the code to use halirutan's idea of using binary representations of integers instead of actual tuples. It turns out, in the examples I've done so far, that it is extremely efficient to work on 5-tuples at a time (instead of n-tuples, with 2^n close to what your machine can handle). This is a feature of the exact equations I'm dealing with, rather than an innate feature of the code. It's convenient and somewhat unexpected. In any case, the size of the intermediate tuples can be adjusted up to suit the situation. Here's the code:

Clear[ValidTuples, SubBuild]
ValidTuples = {{}};
SubBuild[m_, n_] := d[m] -> n;
SetAttributes[SubBuild, Listable];

Module[{SelfSquares, SelfSquaresPartition, SquareReplaces, len}, 
 SelfSquares = 
  Flatten@Last@
    Reap@Do[If[
       MemberQ[CubeReduced2, d[i]^2 == d[i]] || 
        MemberQ[CubeReduced2, d[i]] == d[i]^2, Sow[i]], {i, 1, 
       varnum}];
 SelfSquaresPartition = Partition[SelfSquares, 5, 5, {1, 1}, {}];
 Do[
  len = Length[SelfSquaresPartition[[i]]];
  SquareReplaces = 
   Map[SubBuild[SelfSquaresPartition[[i]], 
      IntegerDigits[#, 2, len]] &, Range[0, 2^len - 1]];
  ValidTuples = 
   Flatten[Table[
     Map[Join[#, SquareReplaces[[x]]] &, ValidTuples], {x, 1, 
      Length[SquareReplaces]}], {{1, 2}, {3}}];
  ValidTuples = 
   Select[ValidTuples, 
    And @@ Map[! TrueQ[# == False] &, CubeReduced2 /. #] &], {i, 1, 
      Length[SelfSquaresPartition]}]]

The code builds up a list of all indices i such that d[i]^2==d[i], then Partitions them into smaller chunks. Very small chunks, as mentioned. It then successively builds up which of the 0/1 substitutions do not fail to satisfy the equations.

The final answer is contained in ValidTuples. Which are actually substitution rules, so that's a bit of a misnomer.

Since the code works on only 5-tuples at a time, it's possible using the binary trick is not worth it, but the algorithm is now robust enough to handle some larger examples, at which point a larger value than 5 may be desirable, and the binary digits much more productive.

This code (or rather, an altered version to deal with the various levels of the lists present in the actual data) has successfully worked it's way through an example for which my original code received insufficient memory errors. And it did so in a respectable 3.9 seconds.

Converting the variables from $d_i$ to d[i] also seems to have sped up the entire spreadsheet, as a bit of a side note. I'll definitely remember that.

Zibadawa, may I ask some question regarding your code?: Why do you use Reap/Sow, MemberQ, and a loop to get the equations from CubeReduced2? Have you tried to use Cases? Why the second Do loop? Have you tried Map/Scan (hint: use ; (CompoundExpression) to terminate the pure function in Scan: Scan[somevar = somevalue; &, SelfSquaresPartition])? Considering performance: One thing I have noticed: you generate a BIG list of integers with Range[0, 2^len - 1] if len is not small. Try generating those numbers on-the-fly, do not store them, there is no information inside! — Theo Tiger, Commented Jul 8, 2013 at 11:22
@Theo The list of integers is not being stored. They're passed in as an argument. And the algorithm is specifically designed to keep len small. If you check, the code as written will assure that len is never greater than 5. As for what purpose the range serves, see halirutan's answer. And on the questions why I use the functions I do, usually the answer is "because I don't know any better". There's a great many ins and outs of clever and speedy mathematica coding I don't know, so it's usually a matter of which functions/tricks I learn first that I can make work. — Zibadawa, Commented Jul 8, 2013 at 15:29
And as for the 'second' Do loop, it's because I have two distinct things to iterate over (of different sizes, and the second one requires the first one to be completed). The first one can probably be removed by your suggestions. I'll look into that. The second is for running over each piece of the partition. — Zibadawa, Commented Jul 8, 2013 at 15:32
The Range[0, 2^len - 1] part is what I think limits you on len. This is one reason why you have to Partition your data into small chunks. When you Map a function to a Range, the Range is evaluated and the result is stored in memory. Although it will be cleared when computation is complete, it consumes ByteCount[Range[0, 2^len - 1]] bytes of memory during computation, which is roughly 2^(len+2) (empirically). Thus, with len = 30 for instance, 4 GB of RAM will be filled. — Theo Tiger, Commented Jul 8, 2013 at 15:46
@Theo You may wish to read the posts again. The entire conceit of the algorithm is to guarantee that a case like len=30 will never happen. In my code, len will never exceed 5. I originally posted the question because a much too large set of tuples was being generated; with suggestions from here, the code was reworked to look at strictly manageably sized tuples (that 5 turned out to be a good idea was a convenient offshoot of the application, rather than coding). — Zibadawa, Commented Jul 8, 2013 at 15:58

Stack Exchange Network

Memory efficient generation and selection of tuples

2 Answers 2

General idea

1. Approach: Slow but memory efficient

2. Approach: Fast but memory consuming

Combining the two approaches

Conclusion

Not the answer you're looking for? Browse other questions tagged
memory
combinatorics
or ask your own question.

Linked

Hot Network Questions

Memory efficient generation and selection of tuples

2 Answers 2

General idea

1. Approach: Slow but memory efficient

2. Approach: Fast but memory consuming

Combining the two approaches

Conclusion

Not the answer you're looking for? Browse other questions tagged memorycombinatorics or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
memory
combinatorics
or ask your own question.