4
\$\begingroup\$

Although there exists the most efficient "Knuth-Morris-Pratt" algorithm of string pattern matching. I tried to achieve the same result with a different approach. But I am not sure about it's efficiency. I am a novice in programming. Can anyone show me how to analyze this algorithm and find it's efficiency?

Algorithm:

Step 1: Calculate the integer value of the pattern by adding the integer values of each of the characters in it. Let it be "m"

Step2: Calculate the integer values of parts of the string, with the same length as that of the pattern in the similar way as described in step 1.

Step3 : Store these values in an array.

Step 4 : Compare the values in array with the "m" value from step1.

Step 5: If the values match then check if the part of the string matches with the pattern. If it does then return true. If the pattern does not match at all then return false.

I have coded the algorithm in Java. Here is the code.

public class StringPatternMatchingAlgorithm {
    public static boolean patternMatching(String str,String pattern){
        int patternLength = pattern.length();
        int patternSum = 0;
        int[] patternSumArray = new int[(str.length() - patternLength)+1];
        int patternSumArrayIndex = 0;
        int patternSumToCheck = 0;
        String toCompare = "";
        for(int i=0; i<pattern.length();i++){
            patternSumToCheck = patternSumToCheck + pattern.charAt(i);
            
        }
        for(int i=0; i<((str.length())-(pattern.length()-1));i++){
            patternSum = 0;
            for(int j=i; j<patternLength;j++){
               patternSum = patternSum + str.charAt(j);
            }
            patternLength++;
            patternSumArray[patternSumArrayIndex] = patternSum;
            if(patternSumArrayIndex < patternSumArray.length-1 ){
                patternSumArrayIndex++;
            }    
        }
        for(int i=0; i<patternSumArray.length;i++){
            if(patternSumArray[i] == patternSumToCheck){
                toCompare = "";
               for(int j=i; j<(i+pattern.length());j++){
                   toCompare = toCompare + str.charAt(j);
               }
               if(toCompare.equals(pattern)){
                  return true; 
               }
            }
        }
        return false;
    }
    public static void main(String[] args) {
        String str = "cdxabie";
        String pattern = "bie";
        System.out.println(patternMatching(str, pattern));
    }
 }

The code can be run here

Edit: As suggested in the comments to the first version of algorithm, I have committed a modification. I used two variables "forthIndex" which is initialized to pattern's length and "backIndex" initialized to 0. Also, I calculated the integer sum of number of characters, equal to the length of pattern from index 0 and stored it in a variable named "patternSum" only once. Then onwards the algorithm adds the integer value of character at "forthIndex" to "patternSum" and substracts the integer value of character at "backIndex" from "patternSum" simultaneously, as the string is traversed. This eliminates the recalculation of integer sum of characters in string as it was in the former version of algorithm.

Here is the improved code:

public class StringPatternMatchingAlgoVariant1 {

public static boolean matchPattern(String str,String pattern){
    int patternLength = pattern.length();
    int patternSum = 0;
    int[] patternSumArray = new int[(str.length() - patternLength)+1];
    int patternSumArrayIndex = 0;
    int patternSumToCheck = 0;
    String toCompare = "";
    int backIndex = 0;
    int forthIndex = pattern.length();
    for(int i=0; i<pattern.length();i++){
        patternSumToCheck = patternSumToCheck + pattern.charAt(i);
        patternSum = patternSum + str.charAt(i);            
    }
    patternSumArray[patternSumArrayIndex] = patternSum;
    patternSumArrayIndex++;
    for(int i=0; forthIndex<str.length();i++){
        patternSum = patternSum + str.charAt(forthIndex) - str.charAt(backIndex);            
        forthIndex++;
        backIndex++;
        patternSumArray[patternSumArrayIndex] = patternSum;
        patternSumArrayIndex++;             
    }
    for(int i=0; i<patternSumArray.length;i++){
        if(patternSumArray[i] == patternSumToCheck){
            toCompare = "";
           for(int j=i; j<(i+pattern.length());j++){
               toCompare = toCompare + str.charAt(j);
           }
           if(toCompare.equals(pattern)){
              return true; 
           }
        }
    }
    return false;
}
public static void main(String[] args) {
    String str = "cdxabie";
    String pattern = "bie";
    System.out.println(matchPattern(str, pattern));
}  

}

The new version of the code can be run here

\$\endgroup\$
2
  • 1
    \$\begingroup\$ In the future it would be better to add a follow up question with a link back to the original question rather than editing the question. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. \$\endgroup\$
    – pacmaninbw
    Commented Aug 1, 2021 at 12:51
  • \$\begingroup\$ @pacmaninbw You are correct bro. Will keep in mind hereafter. \$\endgroup\$ Commented Aug 1, 2021 at 12:54

1 Answer 1

4
\$\begingroup\$

A couple of comments on your answer:

Code style

  • If you've already stored the pattern.length(), why keep calling the method? Just be consistent and use that variable. Same goes for str.length(). I'd just store it and use it.

  • I would store str.length()-(pattern.length()-1 in a variable an name it something like numOfCasesToTest, to make it clearer.

  • Check parenthesis usage:

for(int i=0; i<((str.length())-(pattern.length()-1));i++){

Could be written as:

for(int i = 0; i < str.length() - pattern.length() - 1; i++){

Unnecessary parenthesis makes code harder to read. For further details see Operator precedence.

  • Keep your spacing consistent. While some people prefer not having spaces between some or most operators, I'd rather have them, as I think it makes the code more readable. Whichever way you prefer, just keep consistent usage in the code. For example:
i<patternSumArray.length // no space around comparison operator
patternSumArray[i] == patternSumToCheck // space around comparison operator

Algorithm

That was all code style related. Now, a comment on your algorithm: Why calculate all posible cases sum, and only after that look for a match? This way, you are iterating over the whole str whereas it might not be necessary.

I see how you may have reached that solution: you thought process was as described in the algorithm. But you could just ommit the part in which you store the sums, and compare with the match straight away.

For example, say you had a string such as

Abcsomelongstringwithlotsofconentinit

And you are looking for the pattern Abc. Then, why bother doing of all those calculations? Simply, after you calculate one of those sums, compare it with the pattern sum.

Still (maybe I'm being naive), but I do not see how calculating the sum, comparing it, and then actually comparing the charaters is better than just comparing the characters in the first place

\$\endgroup\$
6
  • 3
    \$\begingroup\$ "I do not see ..." - Because you don't compare the characters at all if the sum already doesn't match. Rabin-Karp is better than sum, though. \$\endgroup\$ Commented Jul 31, 2021 at 16:51
  • \$\begingroup\$ @m-alorda Thanks for the review, will definitely follow the instructions. \$\endgroup\$ Commented Jul 31, 2021 at 21:10
  • 3
    \$\begingroup\$ Using the sum as a simple kind of rolling hash could in principle be an improvement on the naive algorithm, except you are calculating the sum of each substring independently, in the naive way. For there to be a potential performance benefit, the sum should be calculated using a moving window approach where you add the new character on the right of the window and subtract the character on the left of the window in order to move the window along by one without needing a loop over the whole window to recompute the sum. \$\endgroup\$
    – kaya3
    Commented Jul 31, 2021 at 22:10
  • \$\begingroup\$ @kaya3, Great suggestion! Thanks! \$\endgroup\$ Commented Aug 1, 2021 at 2:26
  • \$\begingroup\$ @kelly-bundy Oh, I see now. So the idea is prefering to do some calculations instead of doing more comparisons. Also, thanks for the link! \$\endgroup\$ Commented Aug 1, 2021 at 7:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.