NP completeness of deciding whether a set of examples, consisting of strings and states, has a corresponding DFA?

Question

I'm working on a textbook problem, 7.36 in Sipser 3rd edition. It claims that if we are given an integer $N$ and set of pairs $(s_i, q_i)$, where $s_i$ are binary strings and $q_i$ are states (we are unconcerned here with whether the states are accepting), then the problem of "deciding whether there exists some $N$-state DFA for which each $s_i$ causes that DFA to end up in state $q_i$" is NP-complete.

A much more well-known result by Gold (1978) is the same as the above, except with the alphabet being arbitrary instead of binary, and the examples being in the form $(s_i, \mathrm{accept})$ and $(s_j, \mathrm{reject})$ instead of having the state $q_i$. That result is discussed in some other threads, like this, this, and this.

If I naively try to reduce from the Gold problem to the Sipser problem, I would try to replace all "accept" with a single "terminal" accept state, and same for reject, but that fails because if you wanted only a single terminal accept state to handle all accepted strings, you would need to have an NFA with epsilon transitions; if you try to convert that back to DFA, the single terminal states may split up into multiple, which would invalidate your examples.

I suspect Myhill-Nerode may be useful here, since the DFA states correspond to the equivalence classes of strings. I think you can rephrase the Sipser result as deciding whether some given set of strings can be partitioned into some given set of Myhill-Nerode equivalence classes.

I searched through the literature, but was not able to find the Sipser result anywhere. What is another decision problem that we can use as a polynomial-time reduction here? Alternatively, is this result derived or discussed anywhere else?

"you would need to have an NFA with epsilon transitions" -- wouldn't it suffice to make the final character in each $s_i$ transition to this unique accept state? — j_random_hacker, Commented May 20, 2021 at 10:33
@j_random_hacker I heard this same idea independently from various other people. I wrote it into an answer below; comments welcomed. — xdavidliu, Commented May 26, 2021 at 23:36
@j_random_hacker thanks for the comments; and you're right that I forgot to mention that the Gold result is for arbitrary alphabet; I just made an edit to fix that. I'll look closer at your comments and have revisions soon. — xdavidliu, Commented May 27, 2021 at 18:04

xdavidliu · Accepted Answer · 2021-05-27 21:55:18Z

(Note: the below proof may have some issues raised in the comments. I am in the processing of addressing the issues.)

The following "end-character" method was suggested in the comment above, as well as private communication with some others. Any mistakes are my own.

We construct a polynomial-time reduction from the Gold problem (arbitrary alphabet with "accept" or "reject" in the examples) to the Sipser problem (binary alphabet with states in the examples). Since the Gold problem is NP-complete, it then follows that the Sipser problem is also NP-complete.

The Gold decision problem gives us an integer $n$ and pairs $(s_1, x_1), (s_2, x_2) \ldots (s_k, x_k)$ [where each $s_i$ is a string in some arbitrary alphabet $\Sigma$, and each $x_i$ is either "accept" or "reject"] and asks if there exists an $n$-state DFA $G$ that for each $i$, accepts or rejects $s_i$ based on the value of $x_i$.

We claim that the Gold input has such an $n$-state DFA if and only if the examples $(t_1, y_1) \ldots (t_k, y_k)$ has a corresponding $f(n)$-state DFA $H^\prime$ in the Sipser problem, where:

$t_i$ is some binary encoding of $s_i$ plus a common suffix $Z$ that is unambiguously different from any of the other encoded characters in $\Sigma$.
$y_i = q_{x_i}$; that is, either $q_\mathrm{accept}$ or $q_\mathrm{reject}$
$f(n)$ is a function that be computed and written down in time polynomial in $\log n$.

First, suppose the Gold DFA $G$ exists. Then, there exists a size $O(n \log \lvert \Sigma \rvert)$ DFA $H$ that recognizes the homomorphism ($\Sigma$ encoded into binary) of the language of $G$: simply take every transition in $G$ and split up into transitions corresponding to the binary encoding. Since $G$ may have more than one accept state, so does $H$ (which has the same number of accept states as $G$). Hence, modify $H$ by taking every accept state and inserting transitions corresponding to the "end of string" bitstring $Z$ to a single newly-created $q_\mathrm{accept}$, and inserting the same transitions from non-accept states in $H$ to a single newly-created $q_\mathrm{reject}$. The resulting DFA $H^\prime$ clearly fits the examples $(t_i, y_i)$, and its number of states is bounded above by some easily computed $f(n)$.

Now for the other direction, suppose some $f(n)$-state DFA $H^\prime$ exists which fits the examples $(t_i, y_i)$. Then, since by construction every $t_i$ has the "end of string" bitstring $Z$ as a suffix, we simply take $H^\prime$ and for every state $q$ for which reading $Z$ would take $H^\prime$ to $q_\mathrm{accept}$, we make $q$ an accept state. Then, we remove $q_\mathrm{accept}$ and $q_\mathrm{reject}$, and we have the DFA $H$. We can then construct a DFA $G$ which recognizes the same language with characters decoded from binary to alphabet $\Sigma$. This construction can easily be done by starting with the start state in $G$, trying out every encoded character in $\Sigma$ and writing down the transitions in $G$, and recursively repeating. Since $H$ is finite, this procedure is guaranteed to terminate in polynomial time.

(Note: the above solution simply quotes the Gold result, the proof of which is pretty involved. There's probably a simpler method that directly proves the Sipser result instead of using Gold as an intermediary.)

In your question, you said the two problems were the same apart from whether states or acceptance is given for each string, but now it seems the alphabets are different too (binary vs. unrestricted). I think this distinction is much more important. Other points: (1) I'm used to seeing an explicit $f(n)$. I think not giving $f(n)$ explicitly can work, but you need $f(n)$ to be injective (otherwise, the construction for the second direction cannot be well defined: There is some pair $n_1 \neq n_2$ such that $f(n_1) = f(n_2)$, so given $f(n_1)$ it must construct two DFAs of different sizes.) — j_random_hacker, Commented May 27, 2021 at 2:12
(2) In general you'll need $O(n|\Sigma|)$ states for the binary DFA. Consider a state in $G$ with $|\Sigma|=2^k$ distinct successor states: Simulating this with a binary encoding requires a tree of depth $k=\log_2 |\Sigma|$ containing $2+4+\dots+|\Sigma|/2 = |\Sigma|-2$ intermediate states in total. — j_random_hacker, Commented May 27, 2021 at 2:21
(3) I don't think the second direction is locked down yet, because you need to show that the DFA $G$ that you reconstruct from $H'$ has at most $n$ states. Unless I'm missing something, I think this will be hard/messy to show, because in some sense this entails "inverting" $f(n)$, and at the outset you lack knowledge of the structure of $H$ -- all you know is that it has exactly $f(n)$ states. It could be that the binary encoding chosen allows a "highly optimised" $H$ containing far fewer than $f(n)$ useful states, plus a bunch of unreachable "ballast" states. — j_random_hacker, Commented May 27, 2021 at 2:34

Stack Exchange Network

NP completeness of deciding whether a set of examples, consisting of strings and states, has a corresponding DFA?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regular-languages
finite-automata
np-complete
polynomial-time-reductions
or ask your own question.

Hot Network Questions

NP completeness of deciding whether a set of examples, consisting of strings and states, has a corresponding DFA?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regular-languagesfinite-automatanp-completepolynomial-time-reductions or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regular-languages
finite-automata
np-complete
polynomial-time-reductions
or ask your own question.