3

The problem to solve

Solve k-combination whose input may contain repeated items, and return only unique combinations.

e.g input is [0,1,2,2,4], k = 2, the result is:

{(0, 1), (2, 4), (1, 2), (0, 4), (1, 4), (0, 2), (2, 2)}


The solution I can think of

The only general solution I can think is to use a set or map to extract unique results.

e.g with python

from itertools import combinations
print (set(combinations([0, 1, 2, 2, 4], 2)))

Questions

  • Is there a general solution to do this, without using a set/map for filtering.

I can generate combinations of unique items, in order (refer: https://stackoverflow.com/a/76048486), but if there are repeated items, then there will be repeated combinations.

I've read this: https://math.stackexchange.com/a/554237 , seems it says there is no general solution to get all unique combinations, though there do have a formula to get count of unique combinations.

But, I'm not sure.


@Update - Summary & tips

  • According to the answers, there are ways both via iteration & recursion to do this.
  • When translating from python to go:
    • yield in python, need chan in go, or u need return all combinations (which can be huge for large input).
    • python's else for for and while is omitted when break.
    • list slicing in python will make a copy (of reference ?), while slicing a go slice will reuse the same underlying array, (so in go, if the slice being copied will be modified later, and u don't want that change in the copy, then u should copy/append it into a new slice).
14
  • @JosephWood This is an algorithm question, and I don't see any fitting answer there. (I had btw made the same mistake, flagged as duplicate of this, and retracted when I realized this is not a python question...) Commented Apr 20, 2023 at 0:00
  • @KellyBundy, btw didn't know about more_itertools.distinct_combinations. I'll have to check it out. Commented Apr 20, 2023 at 0:44
  • 1
    @JosephWood To me, sympy.multiset_combinations doesn't sound general. Can you use in in Java or Go? Commented Apr 20, 2023 at 0:50
  • Using "set" simplifies the problem drastically, removes unnecessary complexity, so it is viable way.
    – MBo
    Commented Apr 20, 2023 at 3:23

3 Answers 3

3

Here’s a non-recursive enumerator for combinations in lexicographic order, optimized for simplicity over speed. Knuth’s The Art of Computer Programming almost certainly has a speed-optimized version.

def combinations(a, k):
    a = sorted(a)
    while True:
        yield a[:k]
        for i in range(k - 1, -1, -1):
            # a[i + 1 :] is sorted. Sort a[i:].
            x = a[i]
            j = i + 1
            while j < len(a):
                if x < a[j]:
                    break
                a[j - 1] = a[j]
                j += 1
            a[j - 1] = x
            # a[j] is the next greatest element after x. If there are enough
            # elements after it, rotate them into position.
            if j + (k - i) <= len(a):
                rotate(a, i, j, j + (k - i))
                break
        else:
            break


def rotate(a, i, j, k):
    reverse(a, i, j)
    reverse(a, j, k)
    reverse(a, i, k)


def reverse(a, i, k):
    for j in range((k - i) // 2):
        a[i + j], a[(k - 1) - j] = a[(k - 1) - j], a[i + j]


print(*combinations([0, 1, 2, 2, 4], 2))
print(*combinations([0, 1, 2, 2, 4], 3))
print(*combinations([0, 0, 0, 1, 1, 1], 3))

Output:

[0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
[0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
[0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]
1
1

Below is an algorithm I developed some time back for this task:

def multiset_combinations(v, m):
    
    v = sorted(v)
    f = [0]
    j = 0

    # Populating f, a list of expanded indices. E.g. For
    #
    #    v = [2, 2, 33, 33, 33, 40]
    #
    # f would be:
    #
    #    [0, 0, 1, 1, 1, 2]
    
    for i in range(1, len(v)):
        if v[i] != v[i - 1]:
            j += 1

        f.append(j)

    v = list(set(v))
    r = [0] * m
    n = len(v)
    
    z   = f[:m]
    idx = [0] * n
    p   = len(f) - m
    
    last = f[-m:]
    last[-1] += 1
    
    # Find the first occurence of each index. E.g. For the
    # same example above, idx would be:
    #
    #   [0, 2, 5]
    
    for i in range(n):
        idx[i] = f.index(i)
    
    # The main algorithm

    while True:
        while z[-1] < n:
            for j in range(m):
                r[j] = v[z[j]]

            z[-1] += 1
            yield r.copy()

        if z == last:
            break
        
        for i in range(m - 2, -1, -1):
            if z[i] != f[p + i]:
                z[i] += 1
                
                for j, k in zip(range(i + 1, m),
                                range(idx[z[i]] + 1, idx[z[i]] + m)):
                    z[j] = f[k]
                
                break

Calling it:

print(*multiset_combination([0, 1, 2, 2, 4], 2))
#> [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]

print(*multiset_combination([0, 1, 2, 2, 4], 3))
#> [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]

print(*multiset_combination([0, 0, 0, 1, 1, 1], 3))
#> [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]

It is pretty efficient as well. It can generate 753564 in just under a second:

v1 = [1,
      2, 2,
      3, 3, 3,
      4, 4, 4, 4,
      5, 5, 5, 5, 5,
      6,
      7, 7,
      8, 8, 8,
      9, 9, 9, 9,
      10, 10, 10, 10, 10,
      11,
      12, 12,
      13, 13, 13,
      14, 14, 14, 14,
      15, 15, 15, 15, 15]
      
def get_time(v, m):
    t1 = time.time()
    list(multiset_combinations(v, m))
    t2 = time.time()
    return t2 - t1
    
get_time(v1, 10)
#> 0.8702290058135986
3
  • 1
    Your python version finishes in 1.38 s, the one from another answer stackoverflow.com/a/76064642 finishes in 9.93 s, on my laptop, so indeed yours is faster. Though go is much faster than python seems.
    – Eric
    Commented Apr 21, 2023 at 13:33
  • 1
    I've translated your answer into go: go.dev/play/p/jdQvr6c5b0g , it finishes in 0.12 s without print, while my translation for the other answer go.dev/play/p/RyanyA38Vuc takes 0.20 s on my laptop. So, yours is still faster. BTW, the above link can't finish in go.dev/play probably due to it takes too much resource, need to run it locally.
    – Eric
    Commented Apr 21, 2023 at 18:06
  • 1
    I added another go translation which send combinations to go channel, which is more like yield in python: go.dev/play/p/3WivvzV2E0u , now it can be run online, it takes 0.15 s on my laptop, and it's generic.
    – Eric
    Commented Apr 21, 2023 at 18:29
1

Here is code that generates these one at a time in Python.

I'm using iterators. Translating away from that would be the hardest part of translating this to another language.

def unique_multisets (multiset, k):
    # Find the repetition of each item.
    count = {}
    for x in multiset:
        count[x] = 1 + count.get(x, 0)

    # Turn this into (value, count) pairs.  The reversed is
    # because we will reverse it later in generating the multisets.
    pairs = list(reversed(count.items()))

    # Calculate a running total.
    total = 0
    totals = []
    for p in pairs:
        total += p[1]
        totals.append(total)

    def recurse (partial, idx):
        if k == len(partial):
            yield partial
        elif idx < 0 or totals[idx] < k - len(partial):
            pass # Can't make combinations.
        else:
            (value, repeat) = pairs[idx]
            max_used = min(repeat, k - len(partial))
            for i in range(1 + max_used):
                partial.append(value)
            for i in range(1 + max_used):
                partial.pop()
                yield from recurse(partial, idx - 1)

    yield from recurse([], len(pairs)-1)

for x in unique_multisets([0, 1, 2, 2, 4], 2):
    print(x)
7
  • Looks correct, wondering is it possible to make it an iteration, (I'll try after fully understand it later). I head that every recursion have an iteration version, but not sure.
    – Eric
    Commented Apr 20, 2023 at 7:56
  • @Eric It goes the other way. Every iteration can be rewritten as recursion. But there are recursive functions that cannot be rewritten as iteration, the Ackermann function being the standard example. I can solve this one iteratively, but it will be a very different solution.
    – btilly
    Commented Apr 20, 2023 at 13:49
  • @btilly That doesn't sound right. Do you mean some restricted kind of "iteration"? Commented Apr 20, 2023 at 14:12
  • @KellyBundy that exactly- you can compute any recursive function iteratively by emulating a call stack with segmenting the entry/exit points appropriately. Primitive recursive functions don't require this "hack"- you can determine the number of loops before entering the loop. Non-primitive recursive functions (like the ackermann function) require an explicit stack. Commented Apr 20, 2023 at 14:27
  • @KellyBundy I meant "bunch of for loops". With a call stack you can emulate anything - that's how we can run recursive functions in the first place.
    – btilly
    Commented Apr 20, 2023 at 14:52

Not the answer you're looking for? Browse other questions tagged or ask your own question.