Algorithm design: find any path in an undirected acyclic graph which has a total sum of the nodes as a specific value

Question

The question was asked in an interview, and I'm not sure if this is the most optimized answer, but here goes-

You have an undirected acyclic graph, where each node has a non-negative value x and the edges have 0 value. You need to find a path with any number of nodes on it such that the total sum of all node values is exactly k. There are N nodes in the graph. You can start at any node.

I gave the brute-force solution for this problem, which is basically do a dfs on each node until you find a valid "root" node, such that it has a path to one of the other nodes, with a total sum being k. The time complexity of this solution would be n^2, since we are doing DFS on each node.

The question was posed as a delivery problem, where you have N connected houses, each having a package of weight within the range 1 to k, and the truck can start at any house and should follow a path, and has to pick the package from every house that is on that path. The weights should sum exactly to k.

Are you sure it was undirected acyclic? If so, that's just a forest. In which case a modified BFS should work just fine. — ryan, Commented May 22, 2020 at 5:19

Bernardo Subercaseaux · Accepted Answer · 2020-05-22 22:07:04Z

1

There's a solution in pseudo-polynimial $O(nk)$. First, you can safely delete every node of the graph with weight bigger than $k$. For each node $v$ keep an array $A[v]$ of size $k$, where $A[v][i]$ says how many children $u$ of $v$ have paths that start in $u$ (a node $u$ is the start of a path $u_1, \ldots, u_m$ if $u = u_1$ and $u_{i+1}$ is a child of $u_i$ for all $i$) and whose weight is exactly $i$. Also, let $w$ be the function that maps nodes to their weights.

Your graph is a bunch of trees, and you can process each of them individually. For a given tree, choose an arbitrary node as the root, and do dynamic programming in the following way to fill up $A$. For any node $v$ we have that

$$ A[v][i] = \sum_{u \text { is a child of } v} \big[ A[u][i - w(u)] \geq 0\big] $$

If for some $v$ it holds that $A[v][k-w(v)] \geq 1$ we are done. This corresponds to the case where there is a path that starts in $v$ and has weight $k$. It could be however, that the path of weight $k$ we are looking for does not start in $v$, but it rather has $v$ as its closest node to the root. In order to detect this, we can process the tree from bottom to top. We denote the parent of a node $u$ as $p(u)$. For each node $u$, if there is some $j$ such that $1 \leq j \leq w(u)$ and $A[u][j-w(u)]$, then we know there is a path of weight $j$ that starts in $u$. We call this path $\pi_1$. So we only need to check whether there is a path $\pi_2 \neq \pi_1$ of weight $k-j-w(p(u))$ that starts from a child of $p(u)$. This corresponds to check the value of $A[p(u)][k-j-w(p(u))]$. Note that it could happen that $w(\pi_1) = w(\pi_2)$, in which case we would need to check whether $A[p(u)][k-j-w(p(u))] \geq 2$.

Thanks to @j_random_hacker for the suggestions on how to improve the algorithm.

edited May 22, 2020 at 22:07

answered May 22, 2020 at 5:30

Bernardo Subercaseaux

1,2486 silver badges12 bronze badges

$\begingroup$ A couple of observations: (1) Weights are on vertices, not edges. (I think this can be fixed easily.) (2) What if the only valid solution is not a "straight down" path but instead "bends" at some vertex? It seems like you need to test all pairs of children, making this $O(n^2)$ in the worst case. $\endgroup$
– j_random_hacker
Commented May 22, 2020 at 15:01
1

$\begingroup$ @j_random_hacker thanks for pointing those out! I had misread the problem statement. What about now? The current solution handles the case for paths that "bend" at some vertex. It is no better than the $O(n^2)$ algorithm if $k ~ n$, but way better if $k << n$. $\endgroup$
– Bernardo Subercaseaux
Commented May 22, 2020 at 18:31
$\begingroup$ Thanks, I think I understand, but it actually seems clearer to me if you stick with an array in which each element $i$ is the number of children with total weight $i$. Then your second phase can be handled in $O(k)$ time per vertex by scanning the array and checking if there exists an $i$ such that (i != k-i-w(v) && A[i] + A[k-i-w(v)] >= 2) || (i == k-i-w(v) && A[i] >= 2). (Note that counts rather than booleans are necessary to correctly handle the case in which 2 equal-weight paths through distinct children yield the only solution.) This enables everything to stay $O(nk)$ too :) $\endgroup$
– j_random_hacker
Commented May 22, 2020 at 21:03
$\begingroup$ If you clarify exactly what happens in your phase 2, I'll +1. (Key for me was realising that we can charge the update of a parent's state to the child.) BTW I think we are as likely to see $k \gg n$ as vice versa, but your approach suggests other avenues -- e.g., we could first attempt to solve the problem with all weights including the target taken modulo 2. If this fails, we know the original problem has no solution. If it succeeds, we can try again modulo 4, etc. This takes a factor $\log k$ longer for a positive solution but should fail faster when no solution exists. $\endgroup$
– j_random_hacker
Commented May 22, 2020 at 21:04
1

$\begingroup$ I addressed your comment, that was very helpful. About the likelihood, I guess it depends on the specific context of the problem, and as $k$ is a completely independent parameter of $n$, it's plausible that the context of interest has small values of $k$. But you're right in pointing out that nothing guarantees that this is the case, in which case the algorithm could be pretty bad. As the question was asked in an interview, I'd be very surprised if they expected a solution more sophisticated than this one. $\endgroup$
– Bernardo Subercaseaux
Commented May 22, 2020 at 22:10

| Show 4 more comments

ryan · Accepted Answer · 2020-05-28 16:48:15Z

I'm going to assume the graph is of bounded degree, specifically $d = 3$ (i.e. a binary tree). It could be any bounded degree, but it's easier to visualize with a binary tree. If the degree is unbounded, then this becomes slower.

Algorithm

First thing is to root the tree, preferably with a node that splits the tree evenly among its subtrees. That is, its subtrees have the same size. Though this is not required as we can still do an average case analysis.

Next, compute partial sums for each node except the root node (let it be 0). Specifically, the partial sum of each node is the value of the node plus the partial sum of its parent. For a node $v$ this is represented by $p(v)$:

$$p(v) = \text{val}(v) + p(\text{parent}(v))$$

This can be done with a DFS. Let's call the list of partial sums in the left subtree $L$ and the list of partial sums in the right subtree $R$. Choose whichever list is smaller, let's say that is $L$.

Now, sort $L$ and sort $R$. Then using the two pointer method (see method 4 here), we walk from smallest to largest in $L$ and largest to smallest in $R$ looking for the value $k - \text{val}(u)$. Where $u$ is the root node (since it was not included in the partial sums, we must add it back in, or rather subtract it out).

If we find this value, return the two nodes $L$ and $R$ where it was found. The path between these two nodes, through $u$, are a path of value $k$.

If we do not find this value, we know the path of value $k$ does not contain our root node $u$. Thus, we recurse on both (or all) subtrees.

With a binary tree, on average the recurrence would be:

$$\begin{align} T(n) = 2T(n/2) + n\log n\\ \end{align}$$

Which comes out to $T(n) = O(n (\log n)^2)$.

If the degree is not bounded then instead of just scanning $L$ and searching in $R$ we'd have to do this for all pairs of subtrees. Thus, on average we have:

$$T(n) = kT(n/k) + k^2 n \log n$$

Where $k$ is the branching factor.

You could possibly get better bounds by realizing that the average size of $L$ and $R$ for all pairs of subtrees would be $n/k$ thus bringing us to:

$$T(n) = kT(n/k) + k n \log n$$

Example 1

Let's say we have the following graph and $k = 74$

First, root the tree

Second, compute partial sums

Now we have:

$L = [2, 7, 9, 28, 32, 69]$

$R = [3, 14, 16, 20, 45, 51, 86, 88]$

Third, scan through $L$ and $R$ searching for $74 - 1 = 73$

2 + 88 = 90 > 73, decrement $r$
2 + 86 = 88 > 73, decrement $r$
2 + 51 = 53 < 73, increment $l$
7 + 51 = 58 < 73, increment $l$
9 + 51 = 60 < 73, increment $l$
28 + 51 = 79 > 73, decrement $r$
28 + 45 = 73!

We've found 73, so we know a path of value 74 starts where $p(s) = 28$ and ends where $p(t) = 45$. This path is: $[19, 7, 2, 1, 3, 13, 29]$.

Example 2

Let's say in the prior example, $k = 113$. In the third step, we would not find the correct value, so we'd recurse on:

And:

We wouldn't find a solution in the left subgraph since the sum of all values is less than $k$. We can root the right subtree as:

We wouldn't find it crossing 13, so we'd recurse again on:

And:

The left subtree can be thrown out for the same reason we did earlier. Then we would immediately find it in the right subtree as $[41, 29, 43]$ for a path of value 113.

Optimizations

Miscellaneous optimizations are shown here.

Delete large nodes

You can also do what Bernardo suggested by deleting all nodes with value larger than $k$ then call this algorithm on each remaining tree in the forest.

Delete small trees

You can also ignore an entire tree (or subtree) if the sum of all nodes in that subtree is less than $k$.

Root all trees as binary trees

When computing partial sums, since we are not including the root node, we can use this step and convert it into a binary tree (at least from the root's perspective) by adding at most a linear number of nodes.

If the root has $d$ children, then we create a balanced binary tree with $d$ leaves where the leaf nodes are those $d$ children and all other nodes are value 0. We compute the partial sums and follow that formula above to determine if it crosses the root node of this new tree.

For example:

Would convert to:

Then the algorithm would tell you if the path spans the root node (in read) which really means "does this path go from the children [1, 2, 3, 4] to the children [5, 6, 7, 8]?". Then if the answer is no, you recurse on the following:

And:

With this optimization we can bring the recurrence (even without bounded degree) back to:

$$T(n) = 2T(\lceil n/2 \rceil) + n \log n$$

Which again results in $T(n) = O( n (\log n)^2)$.

Sort while computing $p(v)$

You can sort implicitly by running Dijkstra's algorithm instead of just doing a dfs. This won't asymptotically be an improvement however since it will still take $O(V \log V) = O(n \log n)$ to run Dijkstra's.

Nice approach, I think I see what Bernardo was meaning with the Centroid Decomposition now. A couple of suggestions: (1) Adding a new root halfway along some central edge means you always have exactly 2 subtrees to consider, regardless of the maximum degree. (2) For a particular node, once the two subtree partial sums are sorted, you can find a pair summing to $k$ in $O(n)$ time by walking 2 pointers: One forwards through $L$, one backwards through $R$. (This does not change the time complexity though since we still need to sort.) — j_random_hacker, Commented May 27, 2020 at 21:05
I added a variation of (1) in the optimizations section, but yeah the basic idea is we can always convert it to a binary tree while adding no more than a linear number of nodes. I thought about (2) initially but then second guessed myself for some reason. I added it back in there as it's a bit clearer. I also added more details + examples + optimizations. — ryan, Commented May 28, 2020 at 16:49

Stack Exchange Network

Algorithm design: find any path in an undirected acyclic graph which has a total sum of the nodes as a specific value

2 Answers 2

Algorithm