8

In the one dimensional array S , There might be any number of elements that belong to the set

U:{A,B,C,D,E}  

and repetition is allowed.
Example :

S  = {E,B,D,C,A,D,A,E,E,D,B,B,A,C} 

Question is:

What is the most efficient way in which I can determine the shortest range/path that contains all the elements belonging to set U ,In any given array S ? keep in mind the array can't be sorted.

In the above example the shortest path is that connecting the first 5 elements of the array S.

EDIT :
1) The Number Of Elements of set U isn't constant.

Thanks in advance.

6
  • 2
    What is your first try? Why don't you think it's optimal, or where have you gotten stuck?
    – Marcin
    Commented Apr 13, 2011 at 13:48
  • Also, have you even researched substring-searching algorithms?
    – Marcin
    Commented Apr 13, 2011 at 13:50
  • To increase the chance of getting an answer, you should put at least some effort and show what do you have until now (what options did you consider, why those may be viable or not, etc).
    – Aleadam
    Commented Apr 13, 2011 at 13:51
  • If the first element of S was B instead of E, what would the answer be? {B,D,C,A,D,A,E}?
    – James
    Commented Apr 13, 2011 at 13:52
  • @James, I think it is EDBBAC. Commented Apr 13, 2011 at 13:53

5 Answers 5

3

Interesting homework, but you still have to code yourself.

Good thing is you have not told us which language you use, so I take it as a sign as you've decided to code yourself, which is good.


My best try so far:

Have 2 pointers for the sub string (range), one for the start (smaller index) of the range and one for the end (larger index). Both point to the beginning of the array first.

Have a list for how many ABCDEs are in the range respectively.

Iterate end from left to right.

For every character, increment the number for the character in the list. If the result (incremented how many) > 1(, see if start points to the same character. If yes, move start forward and minus 1, and) while start points to a character the related number for which > 1, move start forward and minus 1.

If ABCDE in the list all >= 1, then we've found a candidate range. Compare it to the shortest length (if any), and if it is smaller, update the shortest length and record the index for the start of the new shortest range.

7
  • What do you mean by "a list for how many ABCDEs are in the range."? and what is the "result" of adding 1 to a list .. do you mean it's length?
    – Adham Atta
    Commented Apr 13, 2011 at 14:24
  • @Adham Ayman, a list (int[5]) to record how many As, how many Bs ... how many Es are in the range. Commented Apr 13, 2011 at 14:28
  • @Adham Ayman, the result is the incremented how many for the specific character. Commented Apr 13, 2011 at 14:29
  • @Adham: you need a way to find out if you missed a character from your set {A, B, C, D, E}. Thus you make a list, better yet a map, whatever: some sort of character counter. Every time you find a character, you add one to that character's counter. Commented Apr 13, 2011 at 14:30
  • @Christian Severin: that's what he said to do. The array is an array of counters, so at any position you know how many of each character are in the substring (between the 2 pointers). If any counter is zero, you need to move the end pointer. Otherwise you can move the start pointer.
    – phkahler
    Commented Apr 13, 2011 at 14:51
1

If I understand the problem correctly I think you need to do (language agnostic)

int partLen <- U.length;
do {
    Vector subSets <- S.partition(partLen);
    foreach set I in subSets
        if I.isEqualTo(U) then
            return true;
        else
            partLen <- partLen + 1;
} while (partLen <= S.length);
return false;

Where partition will break up S into subsets of whatever length and isEqualTo can properly compare sets.

2
  • Does partition produce overlapping susbsets?
    – Adham Atta
    Commented Apr 13, 2011 at 14:28
  • Yes, for S above there'd be 10 subsets of size 5
    – James
    Commented Apr 13, 2011 at 16:00
1

First find different elements in the array, which is O(n) stuff. Then use sliding window approach to find minimum span, in which all these elements are present.

You can see here how to find the minimum window: http://tech-queries.blogspot.com/2010/12/finding-minimum-window-in-array-which.html

1

Here's a simple algorithm which scans the array once, constantly checking if its currently-seen covering range is shorter than the previously seen covering ranges.

For simplicity, I'm going to assume that we can map A, B, C, D, and E to the integers 0-4 so that we can easily reference an array. I haven't checked it thoroughly, so please mentally/actually run it on an example or two to make sure it does what you want.

//Cell 0 is the last index at which we saw an A, cell 1 " " saw a B, etc.
int[] mostRecent = new int[U.length];
mostRecent.setAllValsTo(POSITIVE_INFINITY);

int shortestRange = POSITIVE_INFINITY; //We are trying to minimize this number.
int minIndex = 0; //The beginning index of the range
int maxIndex = POSITIVE_INFINITY; //The ending index of the range.

for(int i=0; i< S.length; i++) {
    int currentValue = S[i]; //This value will be 0-4, corresponding to A-E

    mostRecent[currentValue] = i;

    currentMax = mostRecent.findMax(); //beginning of current range
    currentMin = mostRecent.findMin(); //end of current range
    currentRange = currentMax - currentMin;

    if(currentRange < shortestRange) {
        shortestRange = currentRage;
        minIndex = currentMin;
        maxIndex = currentMax;
    }
}

//currentMax and currentMin now carry the starting and ending indices, use them as you see fit.
return shortestRange;

This is order O(nk), with n=S.length and k=U.length. There are still plenty of optimizations to squeeze out of it, but I don't know if I could bring the worst-case order lower.

2
  • Though This Algorithm Finds The Shortest DISTANCE of the path ,It Doesn't Find the PATH itself
    – Adham Atta
    Commented Apr 13, 2011 at 15:28
  • Oh, that's easy though. Just remember the min and max indices as you go along. Here, I'll modify the code.
    – Cephron
    Commented Apr 13, 2011 at 15:35
0

Here is how I would do it (in pseudocode)

let counters[] be an array such that 
counters[j] = number of occurrences of character j, 
where j = 0 for 'A', j = 1 for 'B', etc.

build counters[] by scanning the original string s

let positions[j][] be an array listing the positions occupied by 
character j in the original string s; note the size of 
positions[j][] is equal to counters[j]

build positions[j][] by scanning the original string s;

let currentPositions[] be an array such that 
positions[j][currentPositions[j]] gives the position of the next 
occurrence of character j in the original string s

initially currentPositions[j] = 0 for every j = [0 .. u.length]

let bestLength = s.length
let bestMin = 0
let bestMax = 0
for i in [0 .. s.length] {
    for j in [0 .. u.length] {
        if ( 
          positions[i][currentPositions[i]] < i and 
          currentPositions[j] + 1 < counters[j]
        )
          currentPositions[j]++
    }
    let min = s.length
    int max = 0
    for j in [0 .. u.length] {
        curPos = positions[j][currentPositions[j]
        if (curPos > max) let max = curPos
        if (curPos < min) let min = curPos
    }
    if (max - min + 1 < bestLength) {
        let bestMin = min
        let bestMax = max
        let bestLength = max - min + 1
    }
}

the shortest path is that starting at bestMin, ending at bestMax, 
and having length bestLength

Complexity is O(nk), where n = s.length and k = u.length

2
  • Notice that Adham said that the size of U isn't constant - this means your complexity is O(nk), where k is the size of U. (See my solution, which appears to be similar to yours)
    – Cephron
    Commented Apr 13, 2011 at 16:49
  • @Cephron: you're right: complexity is O(nk) (I edited my answer), and our solutions are similar
    – MarcoS
    Commented Apr 13, 2011 at 17:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.