One Piece Treasure- Find the number of palindromic substrings in a large substring

Question

I'm trying to solve the below question:

A string of characters X (all in lowercase) of length n is present. We can ask a query (i,j) such that if the substring(i,j) of X is a palindromic subsequence. The query return true if the substring(i,j) of X is palindromic else returns false. Each query takes O(1) time to process. The time complexity cannot exceed n∗log(n)*log(n), i.e., we cannot ask more than n∗log(n)*log(n) queries (Assume the string is very big i.e. n is a large number)

We need to determine the number of palindromic substrings in the hidden hidden string.

I'm thinking of the following pseudocode:

function countPalindromicSubstrings(n, query_request):
    count = 0
   
    # Count palindromic substrings of odd length
    for center in range(n):
        radius = 0
        while center - radius >= 0 and center + radius < n:
            if query_request(center - radius, center + radius):
                count += 1
                radius += 1
            else:
                break
    
    # Count palindromic substrings of even length
    for center in range(n - 1):
        radius = 0
        while center - radius >= 0 and center + radius + 1 < n:
            if query_request(center - radius, center + radius + 1):
                count += 1
                radius += 1
            else:
                break
    
    return count

I'm unable to prove this is the most optimal strategy and there doesn't exist a better strategy for getting the work done in better time complexity with Time Complexity Analysis & Correctness Argument.

One way to optimize this algorithm a bit is that we know an even-length substring cannot be palindromic if one half of it is and the other half isn't. Also, if you have a palindromic substring as the nucleus, the larger substring that expands it by equal lengths to the left and right can not be palindromic if either the left or right segment is palindromic and the other is not. You could therefore get better average-case time complexity by caching the results for the smaller substrings in a memo table. — Davislor, Commented Aug 27, 2023 at 19:55
@Davislor I've edited the question. please have a look again. Also the other comment you're suggesting, are you suggesting we memoise the code? — driver, Commented Aug 27, 2023 at 19:58

Justin Chang · Accepted Answer · 2023-08-28 17:26:03Z

I think your approach is almost there. Let's suppose substring(i,j) uses half-open intervals, aka it tests [i, j). Here is how I would get O(n log n) query complexity:

Each palindrome is of even length or odd length, as you observed.
There are at most n centers of odd palindromes: 0, 1, ... , n - 1.
There are at most n-1 centers of even palindromes: 0.5, 1.5, ... , n - 1.5. For simplicity, we can index by the element left of center, as you are doing: 0, 1, ... , n - 1.
Each palindrome has a unique center, so we can simply compute the number of palindromes at each center and sum them up like you're doing.
Suppose a palindrome of length d has radius r = ceiling(d/2) - 1. Observe that the empty string has radius -1 and that single letters have radius 0.
Given a center, it takes log2(n) queries to use binary search and find its largest radius R. The key observation is noting that you can delete the leftmost / rightmost characters of a palindrome and it's still a palindrome. Thus, there are R+1 palindromes at that center.
For even length palindromes, query on [center - radius, center + 2 + radius) performing binary search in radius = [0, 1, 2, ... , min(center, n - center - 2)].
For odd length palindromes, query on [center - radius, center + 1 + radius), performing binary search in radius = [0, 1, 2, ... , min(center, n - center - 1)].

If we put it all together, you get at most (2n-1) * log2(n) queries which is O(n log n). O(1) extra space is needed. Hope that helps. I'll write up some code later if I have time.

EDIT: Here is an implementation that doesn't include empty strings in the count.

def count_pd(n, query_request):
    count = 0

    # odd length palindromes
    for center in range(n):
        for radius in range(1 + min(center, n - center - 1)):
            if query_request(center - radius, center + 1 + radius):
                count += 1
            else:
                break

    # even length palindromes
    for center in range(n - 1):
        for radius in range(1 + min(center, n - center - 2)):
            if query_request(center - radius, center + 2 + radius):
                count += 1
            else:
                break

    return count


def fancy_pd(n, query_request):
    count = 0

    # odd length palindromes
    for center in range(n):
        # binary search on [left, right]
        left = 0  # the palindrome "a" has radius 0
        right = min(center, n - center - 1)
        while left < right:
            radius = (left + right + 1) // 2
            if query_request(center - radius, center + 1 + radius):
                left = radius
            else:
                right = radius - 1
        count += (left + 1)  # left is now R

    # even length palindromes
    for center in range(n - 1):
        # binary search on [left, right]
        left = -1  # the palindrome "" has radius -1
        right = min(center, n - center - 2)
        while left < right:
            radius = (left + right + 1) // 2
            if query_request(center - radius, center + 2 + radius):
                left = radius
            else:
                right = radius - 1
        count += (left + 1)  # left is now R

    return count


def my_query(i, j):
    global n_queries
    n_queries += 1
    return target[i:j] == target[i:j][::-1]


def count_vs_fancy(x):
    global target
    global n_queries
    target = x

    n_queries = 0
    print("COUNT / QUERIES:", count_pd(len(x), my_query), "/", n_queries)

    n_queries = 0
    print("FANCY / QUERIES:", fancy_pd(len(x), my_query), "/", n_queries, "\n")


target = ""
n_queries = 0
print("aba")
count_vs_fancy("aba")
print("aaaaaa")
count_vs_fancy("aaaaaa")
print('"".join(["a" * 2000]) + "bcb" + "".join(["a" * 2000])')
count_vs_fancy("".join(["a" * 2000]) + "bcb" + "".join(["a" * 2000]))

The tough part is avoiding off-by-one errors. Here is some output:

aba
COUNT / QUERIES: 4 / 6
FANCY / QUERIES: 4 / 3 

aaaaaa
COUNT / QUERIES: 21 / 21
FANCY / QUERIES: 21 / 14 

"".join(["a" * 2000]) + "bcb" + "".join(["a" * 2000])
COUNT / QUERIES: 4004004 / 4008008
FANCY / QUERIES: 4004004 / 78305

\$\begingroup\$ this was really insightful to add the code. \$\endgroup\$
– driver
Commented Aug 28, 2023 at 18:03 — driver, Commented Aug 28, 2023 at 18:03

Davislor · Accepted Answer · 2023-08-28 14:12:21Z

In this problem, you’re trying to minimize the number of substring queries. You can do this, in the average case, with memoization, compared to the brute-force search you are doing now.

However

The problem statement says that the answer must make no more than O(n log n) queries, and the algorithm below makes O(N²) queries in the worst case (when the string consists of a single repeated character).

You could enhance other algorithms with some form of memo-ization. Write a wrapper that maintains a lookup table indexed by (i,j), where j > i, storing true/false values. Whenever you make a query, check the table first, and if the result is not cached, add it. Call the provided query function only through this wrapper. This way, you can avoid making duplicate queries.

But Anyway

Let’s take as an example the string "abaabb". The six length-one substrings are trivially palindromic and do not need to be checked. There are five length-two substrings, of which (3,4) (aa) and (5,6) (bb) are palindromes. There are four length-three substrings, of which only (1,3) (aba) is palindromic.

The rest of the way, we can use the queries we have already done to optimize.

From this point on, if we remember which smaller substrings are and are not palindromic, we can start optimizing to avoid unnecessary queries. A string can only be palindromic if its middle is, and if its leftmost segment and its rightmost segment both are or neither are. So, for example, abba is palindromic (and neither ab nor ba are), and aaaa is palindromic (and both its left half and right half are). But, if we already know that the left half of abaa is not palindromic and the right half is palindromic, we do not need to query the full substring. We know it cannot be.

There are three length-four substrings. We know that a length-four substring can only be a palindrome if its middle is. This allows us to skip checking three of the four; only (2,5) (baab) has a palindromic middle. We can additionally look up in our dynamic table whether the left half (2,3) and the right half (4,5) are. Neither are, so the test passes and we do need to query whether this substring is a palindrome, and we find it is. So we’ve saved three queries over the brute-force search.

There are two length-five substrings. Neither has a palindromic middle, so we can skip querying either one. We could also have eliminated (2,6) (baabb) on the grounds that the leading ba is not palindromic but the trailing bb is.

Finally, there is the one length-six substring, consisting of the whole string. The middle segment (2,5) (baab) is palindromic, so we need to test the left and right segments. The left half is palindromic and the right half is not, or alternatively the leftmost two characters are not and the rightmost two are. Either is enough to tell is that the string cannot be palindromic, so we do not need to test it.

A brute-force search would have needed to do 15 searches, which is 50% more than we need if we use dynamic programming.

So, a better algorithm for this (given the artificial limitation that we cannot see the string, only whether an arbitrary substring is palindromic) might work from smaller to longer substrings and, for each substring, look up the memo-ized results for the left, middle and right substrings (for each possible radius) and check whether the previous queries already rule out it being palindromic. Only if all the checks pass do we need to query the substring.

We are not being graded on the time it takes to do this (or it’s assumed to be much faster than asking the Oracle a question), only on the number of queries we make, so eliminating as many queries as we can counts as a win for this problem.

Thank you so much. Can you please give a pseudocode for this approach? It gets a bit trickier when you say memorization but at the same time we need to make sure about the constraints. — driver, Commented Aug 28, 2023 at 8:10
@driver My algorithm is O(N^2) in the worst case, which doesn't fit the specification. — Davislor, Commented Aug 28, 2023 at 13:53

Stack Exchange Network

One Piece Treasure- Find the number of palindromic substrings in a large substring

2 Answers 2

However

But Anyway

Not the answer you're looking for? Browse other questions tagged
strings
complexity
palindrome
or ask your own question.

Hot Network Questions

One Piece Treasure- Find the number of palindromic substrings in a large substring

2 Answers 2

However

But Anyway

Not the answer you're looking for? Browse other questions tagged stringscomplexitypalindrome or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
strings
complexity
palindrome
or ask your own question.