3
$\begingroup$

I have the following problem: given an $m \times n$ binary matrix $A$ like e.g. the following $9 \times 39$ matrix:

111000011100001101000001001111111111000
111000011100001101000000101111111111000
111111111111110010010100000011100000000
000111100011111100110000001100011111010
111111111111110010010010000011100000000
111000011100001101000000011111111111000
000111100011111100110000001100011111100
111111111111110010011000000011100000000
000111100011111100110000001100011111001

I want to find the maximum value of the product $(1+|C_1|)(1+|C_2|)$, where $C_1$ and $C_2$ are any two disjoint subsets of the column indexes of $A$, such that for every $i \in C_1$ and every $j \in C_2$ and for every $1 \le k \le m$, either $a_{ki} = 1$ or $a_{kj} = 1$. It is not required that $C_1 \cup C_2 = [n] = \{1, \ldots, n\}$.

For example, if we permute rows and columns of the above matrix we get:

111111111111111111000000000100100000000
111111111111111111000000000100010000000
111111111111111111000000000100001000000
111111111000000000111111111010000100000
111111111000000000111111111010000010000
111111111000000000111111111010000001000
000000000111111111111111111001000000100
000000000111111111111111111001000000010
000000000111111111111111111001000000001

and then it is clear that $\max{(1+|C_1|)(1+|C_2|)} = 10 \cdot 20 = 200$.

The matrix $A$ can be regarded as the biadjacency matrix of a bipartite graph and $C_1$ and $C_2$ correspond to two complete bipartite subgraphs such that the union of their respective "row" parts gives the "row" part of the bipartite graph corresponding to $A$.

I could try all possible choices for $C_1$ and then evaluate the product based on the $C_2$ that follows, but the algorithm has a time complexity greater than $O(2^n)$.

Can we do better than that? Is there any measure (e.g. some entropy measure) to at least estimate the maximum value?

A possible generalization of the problem would be computing (or estimating) the maximum of the product $\prod_{r=1}^{q}{(1+|C_r|)}$ (with any $q \ge 2$ that maximizes the product) and $C_1, \ldots ,C_q$ disjoint, such that for every $i \in C_s$ and every $j \in C_t$, $1 \le s \lt t \le q$, and for every $1 \le k \le m$, either $a_{ki} = 1$ or $a_{kj} = 1$.

$\endgroup$
1
  • 2
    $\begingroup$ We can define a graph over the columns, where two columns are connected if their union isn't $[m]$. The condition is exactly that there aren't two connected nodes with one in $C_1$ and one in $C_2$. This condition can be realized as a 2-SAT instance, but I'm not sure how you can optimize $(1 + |C_1|)(1 + |C_2|)$ over it. $\endgroup$ Commented Jun 28 at 14:02

2 Answers 2

4
$\begingroup$

Similarly to what Daniel Weber suggested in the comments, construct a graph $G$ on columns as vertices and connect every pair of vertices with an edge whenever the corresponding columns has at least one 1 in each row. You are interested in a non-induced biclique (i.e. complete bipartite subgraph) in $G$ with partite sets $C_1$ and $C_2$ having the maximum value $(1+|C_1|)(1+|C_2|)$. Such a biclique will necessarily be maximal, and there exist algorithms for finding all maximal bicliques - e.g. see this paper.

Alternatively, one can notice that there are only a finite number (less than $n^2$) values of $(1+|C_1|)(1+|C_2|)$, go over them in decreasing order and search for the corresponding complete bipartite graph(s) as a subgraph in $G$ until one is found.

$\endgroup$
1
  • $\begingroup$ Regarding the alternative method, once I fix a value for the product, how do I search efficiently for the corresponding complete bipartite subgraphs? $\endgroup$ Commented Jun 28 at 18:16
4
$\begingroup$

You can solve the problem via integer linear programming as follows. Let $$E=\left\{(j_1,j_2)\in[n]\times[n]: j_1 < j_2 \land \lnot \bigwedge_{i=1}^m (a_{ij_1} \lor a_{ij_2})\right\}$$ be the set of edges in the graph described in the comment by Daniel Weber. Let $P=\{1,2\}$ be the set of parts. For $j\in [n]$ and $p\in P$, let binary decision variable $x_{jp}$ indicate whether column $j$ appears in part $p$, so that $C_p=\{j\in[n]:x_{jp}=1\}$. For $p\in P$ and $k\in \{0,\dots,n\}$, let binary decision variable $y_{pk}$ indicate whether part $p$ has cardinality $k$. To linearize the product $(1+|C_1|)(1+|C_2|)$, take logarithms. The problem is to maximize $$\sum_{p \in P} \sum_{k=0}^n \log(1+k) y_{pk}$$ subject to linear constraints \begin{align} \sum_{k=0}^n y_{pk} &= 1 &&\text{for $p \in P$} \tag1\label1\\ \sum_{k=0}^n k y_{pk} &= \sum_{j=1}^n x_{jp} &&\text{for $p \in P$} \tag2\label2\\ \sum_{p \in P} x_{jp} &\le 1 &&\text{for $j \in [n]$} \tag3\label3\\ x_{j_1,p} + x_{j_2,3-p} &\le 1 &&\text{for $(j_1,j_2) \in E$ and $p \in P$} \tag4\label4 \end{align} Constraint \eqref{1} selects exactly one cardinality for $C_p$. Constraint \eqref{2} forces the selected cardinality for $C_p$ to be consistent with $x_{jp}$. Constraint \eqref{3} enforces disjoint parts. Constraint \eqref{4} prevents assigning columns $j_1$ and $j_2$ to different parts. One optimal solution is \begin{align} C_1&=\{1,2,3,8,9,10,15,16,18,27,28,29,30,31,32,33,34,35,36\} \\ C_2&=\{4,5,6,7,11,12,13,14,20\} \end{align} with $|C_1|=19$ and $|C_2|=9$.

$\endgroup$