19
$\begingroup$

I'm looking for an algorithm but I don't quite know how to implement it. More importantly, I don't know what to google for. Even worse, I'm not sure it can be done in polynomial time.

Given a set of numbers (say, {1, 4, 5, 9}), I want to enumerate all subsets of this set (its power set, really) in a certain order: increasing sum of the elements.

For example, given {1, 4, 5, 9}, the subsets should be enumerated in this order, "smaller" sets first:

{} = 0

{1} = 1

{4} = 4

{5} = 5

{1, 4} = 5

{1, 5} = 6

{9} = 9

{4, 5} = 9

{1, 9} = 10

{1, 4, 5} = 10

{4, 9} = 13

{5, 9} = 14

{1, 4, 9} = 14

{1, 5, 9} = 15

{4, 5, 9} = 18

{1, 4, 5, 9} = 19

This feels like some unholy mix between a breadth-first search and a depth-first search, but I can't wrap my head around the proper way to mix these two search strategies.

My search space is very large ($2^{64}$ elements) so I can't precompute them all up-front and sort them. On that note, I also don't need to enumerate the entire search space -- the smallest 4,096 subsets is fine, for example.

Can anyone give any pointers or even any clues to google for? Many thanks.

$\endgroup$
4
  • $\begingroup$ Do you need to iterate over these sets in this order, or do you need a 1-1 map from the sets to integers. In other words do you want "foreach set do { ... }" to get the sets in a particular order, or do you want "for the 47th set do this" and "which number is this set?" $\endgroup$ Commented Dec 7, 2011 at 23:19
  • $\begingroup$ Finding each subset's number is trivial -- just sum them up. Consider, purely for example, guessing the bits of a password where each bit has a confidence score that lets you know how likely you got it right. You'd want to try the passwords with the smallest total penalty scores first. Similarly, I want to search through this huge space, trying sets with the smallest penalty first. The sets themselves are used for something else at each step--this algorithm would merely describe which order to try each subset. $\endgroup$
    – Michael
    Commented Dec 8, 2011 at 0:05
  • 1
    $\begingroup$ Related (a more general problem that also has a polytime-delay algorithm): cstheory.stackexchange.com/questions/47023/… $\endgroup$
    – Neal Young
    Commented Jun 12, 2020 at 14:03
  • $\begingroup$ This is mentioned as an open problem here: dl.acm.org/doi/abs/10.1145/3184400 in page 27. $\endgroup$ Commented May 22, 2022 at 14:23

4 Answers 4

17
$\begingroup$

Here's an algorithm. The basic idea is that each number in the original set iterates through the list of subsets you've already found, trying to see if adding that number to the subset it's currently considering results in the smallest subset sum not yet found.

The algorithm uses four arrays (all of which are indexed starting with $0$).

  1. $N$ consists of the numbers in the original set; i.e., $N = [1, 4, 5, 9]$ in your example.
  2. $L$ is the list of subsets found so far.
  3. $A[i]$ contains the subset that $N[i]$ is currently considering.
  4. $S[i]$ is the sum of the elements of subset $i$ in $L$.

Algorithm:

  1. Initialize $N$ to numbers in the original set, all entries of $A$ to $0$, $L[0] = \{\}$, $S[0] = 0$. Let $j = 1$.
  2. For iteration $j$ find the minimum of $S[A[i]] + N[i]$ over all numbers $N[i]$ in the original set. (This finds the subset with smallest sum not yet in $L$.) Tie-breaking is done by number of elements in the set. Let $i^*$ denote the argmin.
  3. Let $L[j] = L[A[i^*]] \cup \{N[i^*]\}$. Let $S[j] = S[A[i^*]] + N[i^*]$. (This updates $L$ and $S$ with the new subset.)
  4. Increase $A[i^*]$ to the next item in $L$ that has no number larger than $N[i^*]$. If there is none, let $A[i^*] =$ NULL. (This finds the next subset in $L$ to consider for the number $N[i^*]$ just added to an existing subset in $L$ to create the subset just added to $L$.)
  5. If all entries in $A[i]$ are NULL, then stop, else increment $j$ and go to Step 2.


For example, here are the iterations for your example set, together with the subset in $L$ currently pointed to by each number.

Initialization:     
{}   1, 4, 5, 9

Iteration 1:    
{}   4, 5, 9
{1}  

Iteration 2:  
{}   5, 9
{1}  4 
{4}

Iteration 3:  
{}   9
{1}  4, 5
{4}
{5}

Iteration 4:  
{}   9
{1}  5
{4}
{5}
{1,4}

Iteration 5:  
{}   9
{1}  
{4}  5
{5}
{1,4}
{1,5}

Iteration 6:  
{}   
{1}  9
{4}  5
{5}
{1,4}
{1,5}
{9}

Iteration 7:  
{}   
{1}    9
{4}  
{5}
{1,4}  5
{1,5}
{9}
{4,5}

Iteration 8:  
{}   
{1}    
{4}    9
{5}
{1,4}  5
{1,5}
{9}
{4,5}
{1,9}

Iteration 9:  
{}   
{1}    
{4}    9
{5}
{1,4}  
{1,5}
{9}
{4,5}
{1,9}
{1,4,5}

And the rest of the iterations just involve adding $9$ successively to each subset already constructed that doesn't include $9$.

$\endgroup$
5
  • 2
    $\begingroup$ My goodness, that's amazing. How did you come up with this? I would never have thought to think of the problem this way. Many thanks for your help. $\endgroup$
    – Michael
    Commented Dec 8, 2011 at 23:05
  • 1
    $\begingroup$ @Michael: I just tried to exploit the idea that any set appearing in the list had to consist of its largest element added to some set that had already appeared in the list. And you're welcome; it was a fun problem to think about. :) $\endgroup$ Commented Dec 8, 2011 at 23:14
  • $\begingroup$ Hi Mike, I have read your implementation and gone through the iterations that you have shown. I would really appreciate if you could show what A[], S[] and L[] are at each iteration. Also, are A[] and N[] of the same length? $\endgroup$
    – yadav_vi
    Commented Jul 31, 2017 at 18:52
  • $\begingroup$ I implemented @MikeSpivey's answer in Python as another answer to OP's question. $\endgroup$ Commented Jun 15, 2018 at 6:14
  • 1
    $\begingroup$ Does this algorithm have delay poly(n) between enumerated sets? In particular, can Step 4 of the algorithm be implemented in time poly(n)? "Step 4. Increase $A[i^*]$ to the next item in $L$... that has no number larger than $N[i^*]$.." Naively, this involves looking through possibly exponentially many entries of $L$ to find the next, so it seems the delay might not be polynomial. $\endgroup$
    – Neal Young
    Commented Jun 12, 2020 at 14:39
5
$\begingroup$

(Too long for a comment.)

Nijenhuis and Wilf's Combinatorial Algorithms details an algorithm for enumerating all the subsets of a given set, based on the Gray code (there is also a modification that makes it produce the subsets lexicographically). The book can be downloaded for free from the link I gave. You can download the FORTRAN routines NEXSUB() and LEXSUB() from Skiena's webpage. Some modification would be needed to have it output results in your desired order.

$\endgroup$
1
  • $\begingroup$ Thank you, I'll look into this. $\endgroup$
    – Michael
    Commented Dec 8, 2011 at 0:20
3
$\begingroup$

For anyone curious, I implemented Mike Spivey's answer in Python.

def get_next_combs(N, n=4):
  A = [0]*len(N)
  L = [set()]
  S = [0]
  j = 1
  while any(elem is not None for elem in A):
    i_star = np.argmin([S[A[i]] + N[i] if A[i] is not None else np.inf for i in range(0, len(N))])
    comb = L[A[i_star]].union((N[i_star],))
    L.append(comb)
    yield comb
    S.append(S[A[i_star]] + N[i_star])
    i = A[i_star]
    A[i_star] = None
    for i_next in range(i+1, len(L)):
      if N[i_star] > max(L[i_next]):
        A[i_star] = i_next
        break

Should the give the following sample output:

>>> for comb in get_next_combs(range(1,6)):
...     print(sum(comb), comb)
... 
1 {1}
2 {2}
3 {1, 2}
3 {3}
4 {1, 3}
4 {4}
5 {2, 3}
5 {1, 4}
5 {5}
6 {1, 2, 3}
6 {2, 4}
6 {1, 5}
7 {1, 2, 4}
7 {3, 4}
7 {2, 5}
8 {1, 3, 4}
8 {1, 2, 5}
8 {3, 5}
9 {2, 3, 4}
9 {1, 3, 5}
9 {4, 5}
10 {1, 2, 3, 4}
10 {2, 3, 5}
10 {1, 4, 5}
11 {1, 2, 3, 5}
11 {2, 4, 5}
12 {1, 2, 4, 5}
12 {3, 4, 5}
13 {1, 3, 4, 5}
14 {2, 3, 4, 5}
15 {1, 2, 3, 4, 5}
$\endgroup$
0
1
$\begingroup$

A simpler Python implementation (based on my previous "Ugly Numbers" solutions but I guess influenced by skimming the answers here). I build the subsets in order of increasing sum and keep a list of subsets built so far. For each number of the whole set, I have a generator that tries to add the number to the subsets built so far. Then I just merge these generators and extend the list with their results.

from heapq import merge

def subsets(wholeset):
    yield set()
    subsets = [set()]
    def add(number):
        for subset in subsets:
            if not subset or number > max(subset):
                yield subset | {number}
    for subset in merge(*map(add, wholeset), key=sum):
        yield subset
        subsets.append(subset)

Demo (Try it online!):

>>> print(*subsets({1, 4, 5, 9}))
set() {1} {4} {1, 4} {5} {1, 5} {4, 5} {9} {1, 4, 5} {1, 9} {9, 4} {1, 4, 9} {9, 5} {1, 5, 9} {9, 4, 5} {1, 4, 5, 9}

Benchmarks with coneyhelixlake's solution, the first case is the case from the question:

First 4096 subsets from random.sample(range(10**6), 64)
  11.9 ms ±  0.8 ms  subsets
 126.7 ms ±  3.4 ms  get_next_combs

First 100000 subsets from random.sample(range(10**6), 64)
 472.2 ms ±  8.4 ms  subsets
2751.2 ms ± 26.1 ms  get_next_combs

First 32768 subsets from random.sample(range(10**6), 15)
 144.4 ms ± 12.9 ms  subsets
 493.0 ms ± 16.4 ms  get_next_combs

Benchmark code (Try it online!:

def subsets(wholeset):
    yield set()
    subsets = [set()]
    def add(number):
        for subset in subsets:
            if not subset or number > max(subset):
                yield subset | {number}
    for subset in merge(*map(add, wholeset), key=sum):
        yield subset
        subsets.append(subset)

def get_next_combs(N, n=4):
  yield set()
  A = [0]*len(N)
  L = [set()]
  S = [0]
  j = 1
  while any(elem is not None for elem in A):
    i_star = np.argmin([S[A[i]] + N[i] if A[i] is not None else np.inf for i in range(0, len(N))])
    comb = L[A[i_star]].union((N[i_star],))
    L.append(comb)
    yield comb
    S.append(S[A[i_star]] + N[i_star])
    i = A[i_star]
    A[i_star] = None
    for i_next in range(i+1, len(L)):
      if N[i_star] > max(L[i_next]):
        A[i_star] = i_next
        break

solutions = [subsets, get_next_combs]

from timeit import default_timer as time
import random
from itertools import islice
import numpy as np
from heapq import merge
from statistics import mean, stdev

def consume(iterator, n):
    next(islice(iterator, n, n), None)

def test(wholeset, first_n):
    print('First', first_n, 'subsets from', wholeset)
    wholeset = eval(wholeset)

    # Correctness
    expect = list(islice(solutions[0](wholeset), first_n))
    for solution in solutions[1:]:
        result = list(islice(solution(wholeset), first_n))
        assert result == expect

    # Statistics initialization
    times = {solution: [] for solution in solutions}
    def stats(solution):
        ts = sorted(times[solution])[:5]
        return '%6.1f ms ± %4.1f ms ' % (mean(ts) * 1e3, stdev(ts) * 1e3)

    # Measure times
    for _ in range(10):
        random.shuffle(solutions)
        for solution in solutions:
            start = time()
            consume(solution(wholeset), first_n)
            end = time()
            times[solution].append(end - start)

    # Report statistics
    for solution in sorted(solutions, key=stats):
        print(stats(solution), solution.__name__)
    print()

test('random.sample(range(10**6), 64)', 4096)
test('random.sample(range(10**6), 64)', 100000)
test('random.sample(range(10**6), 15)', 2 ** 15)
$\endgroup$
6
  • $\begingroup$ This is some of the most difficult-to-understand Python code I've seen. And that's not exactly a compliment :-) Trying to wrap my head around which generator is invoked when, and how they mutate the non-local subsets variable (which BTW shadows the function name)... is a little nauseating. Also, you don't need to always compute max(subset) in add - you only add a number when it's greater than anything in the set, so you can have a subclass of set where you store the maximum when constructing the set and retrieve it in constant time. $\endgroup$ Commented Jun 13, 2022 at 22:03
  • $\begingroup$ @БоратСагдиев Meh, I find it straightforward, but then again, I wrote it :-). I don't mind the shadowing, it's not recursive so it's not a problem, and "subsets" is simply a good name. In Pascal, this "shadowing" is even the mechanism for saying what value to return. Yes, I could find the max in constant time. Maybe I didn't bother because subset | {number} takes linear time anyway, although that only happens if the condition is true. I already don't remember whether I thought about it. I guess a simpler way would be to use lists or tuples, so that the last element is the largest. $\endgroup$ Commented Jun 14, 2022 at 22:13
  • $\begingroup$ A perhaps bigger issue is that this assumes there are no negative numbers (I think the others also assumed this, and the question only had positives). Someone pointed that out elsewhere. That then requires to start not with the empty set but with the set of all negative values. And then over time remove them from the set instead of adding them to the set like positive numbers. I had some difficulty with that, though. $\endgroup$ Commented Jun 14, 2022 at 22:17
  • $\begingroup$ Just tried it with tuples to find max in constant time, building sets only when yielding to the outside caller. That sped it up by factor 2 to 3 in my benchmark. $\endgroup$ Commented Jun 14, 2022 at 22:23
  • $\begingroup$ Right, I didn't think about the fact that subset | {number} takes linear time. It doesn't have to though - you can use linked list where you just "prepend" the new element, without copying the existing data (=subset). BTW you can use max(subset, default=0). Regarding negative numbers, I'm not sure how exactly would you do that. It may require re-adding a removed negative number; for example, for the set {-2, -1, 2} you have the ordering {-1, -2}, {-2}, {-1}, {-2, -1, 2}, {}, {-2, 2}, {-1, 2}, {2}. $\endgroup$ Commented Jun 15, 2022 at 13:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .