0
$\begingroup$

Let me first explain the problem using an analogy.

Let's say you have $N$ doors and $M$ keys. Each door can be opened with a combination of keys, each combination is also unique (i.e. there aren't be two doors that are opened with same combination).
For example, to open Door $1$ you need $3$ keys: $A, B$ and $C$. For Door $2$ you need $5$ keys: $C, F, G, E$ and $T$. All doors are accessible (there are no doors behind a door).

Now the question is: can you chose $k$ keys, $k < M$, so that combination of keys lets you open the maximum number doors respect to any other combination of $k$ keys and, if the answer to this question is affirmative, which keys should be picked?

This kind of problem can't be solved by exhaustion since, in my case, there are $100$ keys and just generating all combinations would take centuries (there would be around $5.5\cdot 10^{20}$ combinations if $k=20$).

Im not too knowledgeable in algorithms, but this seems like some kind of constraint satisfaction problem but not exactly. Anyway I'm kind of stuck. If I could just figure out what kind of problem this is and what its called I might just be able to solve this thing in less than a century.

$\endgroup$
12
  • $\begingroup$ Do you have data you can share for your 100-key instance? $\endgroup$
    – RobPratt
    Commented Nov 6, 2019 at 17:18
  • $\begingroup$ This is Maximum coverage problem. $\endgroup$ Commented Nov 6, 2019 at 21:29
  • $\begingroup$ Similar flavor, but not quite the same as the maximum coverage problem. If you interchange $x$ and $y$ in the Wikipedia formulation, you can see that the difference between the problems is AND versus OR. In the OP's problem, each door requires all the keys in its set. In maximum coverage, each covered element requires only one set that contains it. $\endgroup$
    – RobPratt
    Commented Nov 6, 2019 at 21:39
  • $\begingroup$ @RobPratt I dont know if Im allowed to post links here but here:link. Its a google drive link. The doors are actually just recipes to make ejuice and keys would be the ingredients. Each line is one recipe, ingredients are separated by commas. I have wrongly assumed that there are only 100 keys. There are actually 2370 keys(ingredients). $\endgroup$
    – Dangz1
    Commented Nov 6, 2019 at 22:24
  • $\begingroup$ Looks like the filenames are reversed, but I was able to download. What is $k$ here? $\endgroup$
    – RobPratt
    Commented Nov 7, 2019 at 0:32

3 Answers 3

3
$\begingroup$

You can solve this problem via integer linear programming. Let binary decision variable $x_i$ indicate whether you can open door $i$, and let binary decision variable $y_j$ indicate whether you select key $j$. The problem is to maximize $\sum_{i=1}^N x_i$ subject to \begin{align} \sum_{j=1}^M y_j &= k \\ x_i &\le y_j &&\text{if door $i$ requires key $j$}\\ x_i &\in \{0,1\} &&\text{for $i\in\{1,\dots,N\}$}\\ y_j &\in \{0,1\} &&\text{for $j\in\{1,\dots,M\}$} \end{align} By request, here's the SAS code (with more descriptive variable names than x and y) that I used to solve the sample instance:

proc optmodel;
   /* declare parameters and read data */
   set <str> RECIPES;
   read data lib.doors into RECIPES=[var1];
   num numIngredients_r {r in RECIPES} = countw(r,',;');
   set INGREDIENTS_r {r in RECIPES} = setof {k in 1..numIngredients_r[r]} scan(r,k,',;');
   set INGREDIENTS = union {r in RECIPES} INGREDIENTS_r[r];

   /* declare decision variables */
   var UseIngredient {INGREDIENTS} binary;
   var SelectRecipe {RECIPES} binary;

   /* declare objective */
   max NumSelectedRecipes = sum {r in RECIPES} SelectRecipe[r];

   /* declare constraints */
   con Cardinality:
      sum {i in INGREDIENTS} UseIngredient[i] = 200;
   con RecipeImpliesIngredient {r in RECIPES, i in INGREDIENTS_r[r]}:
      SelectRecipe[r] <= UseIngredient[i];

   /* call MILP solver */
   solve;

   /* output solution */
   create data SolutionIngredients from [i]={i in INGREDIENTS: UseIngredient[i].sol > 0.5};
   create data SolutionRecipes from [r]={r in RECIPES: SelectRecipe[r].sol > 0.5};
quit;

Note that the UNION set operator avoids reading a separate ingredients file, and so I did not need to exclude any data. The resulting optimal solution for 200 ingredients yields 1345 recipes.

$\endgroup$
6
  • $\begingroup$ This works for doors needing multiple keys to open? $\endgroup$
    – Dangz1
    Commented Nov 5, 2019 at 20:30
  • $\begingroup$ Yes, the $x_i \le y_j$ constraint applies to each such door-key pair. For example, door 1 has three constraints: $x_1 \le y_A, x_1 \le y_B, x_1 \le y_C$. $\endgroup$
    – RobPratt
    Commented Nov 5, 2019 at 20:59
  • $\begingroup$ Code works though for $k = 200$ Ive gotten a bit different result of 1345. Thanks for your help. $\endgroup$
    – Dangz1
    Commented Nov 8, 2019 at 11:40
  • $\begingroup$ Yes, I got 1345, too, when I did not exclude any data. Glad to help. Did you use SAS? $\endgroup$
    – RobPratt
    Commented Nov 8, 2019 at 13:18
  • $\begingroup$ Yes, I used sas online. I see that there are often multiple solutions for a given k. I was able to use UseIngredient[i].sol[n] to get data for solution $n$ but I cant figure out how to pull data from all solutions to a single data set using create data data_name from... syntax. Do you know how to do that? $\endgroup$
    – Dangz1
    Commented Nov 14, 2019 at 17:04
1
$\begingroup$

Your problem definition is somewhat unclear: how can "the answer to this question" not be affirmative, since for every $k$ there will be some maximum number of doors that can be opened with $k$ keys?

If I understand correctly, your problem is: "Given the set of keys necessary to open each door and an integer $k$, what is the maximum number of doors you can open with $k$ keys"? This problem is NP-hard, even when each door requires exactly two keys. Indeed, this special case is the densest $k$-subgraph problem: given a graph, find a set of $k$ vertices with maximum induced number of edges. Each vertex in the graph is a "key" and each edge is a "door" requiring two keys. The best approximation factor known to be achievable in polynomial time is $O(M^{1/4+\epsilon})$, for arbitrarily small $\epsilon > 0$:

https://dl.acm.org/citation.cfm?doid=1806689.1806719

This result does not directly apply to your problem because in your setting, doors may require more than two keys, but it is surely worth looking into.

As an aside, if you were interested instead in finding the largest possible ratio between the number of doors opened and the number of keys necessary to open them, then your problem would become polynomial-time solvable (using techniques similar to those used for the densest subgraph problem, with no restriction on the size of the subgraph). See, e.g., Lemma 4 in the following paper:

https://arxiv.org/pdf/1802.02562.pdf

$\endgroup$
1
1
$\begingroup$

This problem is known as the Minimum k-Union problem. Let $D_j$ be the set of doors requiring the key $j$, then we need to pick a set of keys $T$ with $|T|=M-k$ minimizing the size of $\bigcup_{j\in T} D_j$. The answer to OP's question is given by the complement of $T$.

This problem is known to be NP-hard, and the exists an $O(n^{0.25+\epsilon})$-approximation algorithm - see https://stackoverflow.com/questions/12424155

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.