Informal Problem Statement:
Given a string, e.g. $ACCABBAB$, we want to colour some letters red and some letters blue (and some not at all), such that reading only the red letters from left to right yields the same result as reading only the blue letters.
In the example we could colour them like this: $A\color{blue}{C}\color{red}{CAB}B\color{blue}{AB}$
Therefore, we say $CAB$ is a repeated subsequence of $ACCABBAB$. It is also a longest repeated subsequence (which is easy to check).
Can we compute longest repeated subsequences efficiently?
Formal Question:
Is it NP-hard to decide for a string and some $k$, whether a repeated subsequence of length $k$ exists in the string?
- If so: Which problem can be reduced to this problem?
- If not: What is an efficient algorithm? (obviously, this algorithm can then be used to compute a longest repeated subsequence)
Bonus Question:
Will their always be a repeated subsequence of length $n/2 - o(n)$ if the size of the alphabet is bounded by a constant?
(This is known to be true for binary alphabets.)
Edit 2: The negative answer to the Bonus Question is already known for alphabets of size at least $5$. In fact for alphabets of size $Σ$, there are strings with longest repeated subsequences of a length of merely $O(n · Σ^{-1/2})$. Random strings suffice to show this. The result already existed, but I overlooked it.
Edit: Note:
Some people mean "substring" when they say "subsequence". I don't. This is not the problem of finding a substrings twice.