2
\$\begingroup\$

I'm working on a string manipulation code in C++ for my internship project, where I have to find a substring within a string and essentially replace it and some characters following it with a newly computed sequence based on other inputs (here, the string vector 'patterns'). I have been trying to find a better way to manipulate the string in order to increase efficiency (reduce space cost and possibly also reduce time complexity further). This is a sample of what I've done so far(Sorry if the strings look a bit weird, this was the best simple example I could come up with at the moment):

// some input variables

std::string& originalString ("__  BB  ++  AA  __  AA  __  CC  __  DD");
std::map<std::string, std::vector<std::string>> patternMap = {
                                   {"AA", {"aaa","AAA"}}, 
                                   {"BB", {"bbb","BBB"}},
                               };

std::size_t location = 0;
while ((location = originalString.find("__", location)) != std::string::npos) {
 const std::string subString = originalString.substr(location+4, location+6);
 std::map<std::string, std::vector<std::string>>::iterator patternsIterator = patternMap.find(subString);
 if (patternsIterator != patternMap.end()) {
      const std::vector<std::string> patterns = patternsIterator -> second;
      std::string alternateSubString("*");
      for (const std::string& item : patterns) {
        alternateSubString.append(item + "__" );
      }
      alternateSubString.replace(alternateSubString.size() - 5, 5,"*");
      originalString.replace(location+4, 2, alternateSubString);
      location += alternateSubString.size();
  }
  ++location;
}// Expected  value of the original string should change to: "__  *bbb__BBB*  ++  AA  __  *aaa__AAA*  __  CC  __  DD"

Any tips regarding a better way, maybe do away with the variable alternateSubString? Trying to achieve some space saving and then time optimization here. I'd prefer the original string variable to get modified, and not create a new variable. But I'm a beginner in C++ so any tips for improvement will be appreciated.

\$\endgroup\$
1
  • \$\begingroup\$ The formatting of your code make it very difficult to read. As far as I can tell you are looking at indexes into strings that may or may not exist. I'd love to offer advice, but sorry its too difficult to understand. \$\endgroup\$ Commented May 9, 2022 at 15:02

1 Answer 1

2
\$\begingroup\$

Create a function that does the replacing

It is good practice to create a function for what you are doing. A function will have a name that describes what it is doing, and it clearly defines the input and output via the function parameters and return value. It can be as simple as:

std::string replacePatterns(std::string text, std::map<...> patterns) {
    ...
}

Use more type deduction

Types can get very long if you have to write them out fully, like the type of patternsIterator. However, since C++11 there is auto, so you can use that to avoid having to write the type explicitly. Apart from making the code more concise, it also even avoids some classes of bugs, like where the right hand side of an assignment can be implicitly cast to the wrong type on the left hand side. So just write:

auto patternsIterator = patternMap.find(subString);

There are also some other ways to avoid having to write types that don't involve auto. For example, std::string objects have a static member npos, so you can write:

while ((location = originalString.find("__", location)) != originalString.npos) {

While it didn't save any typing here, the advantage is that you don't have to remember and spell out the type of originalString, and that also makes changing the type of originalString to something else later on easier (like if you would want to switch to using std::string_view in C++17).

Avoid hardcoding and magic numbers

The goal seems to be finding substrings in the patternMap and replacing it with other strings, but it looks like it only checks strings that are 4 characters after the start of "__". Perhaps that is indeed exactly what you want, but what if you want to change this "__" to something else? You could search and replace that string, but the two occurences in your code have different meanings. And then there are things like location+4 that have to be adjusted if the length of the separator changes. But even if you don't want to ever change this, at least define this "__" as a static const variable, and derive things like offsets from that variable:

static const std::string separator = "__";
...
while ((location = originalString.find(separator, location)) != originalString.npos) {
    auto subString = originalString.substr(location + separator.size() + 2, 2);
    ...

There are still some magic numbers in there, they both look the same but they are not: the first 2 is the number of spaces after the separator to skip, the second 2 is the length of the substring to match. You should also turn these into static const (or even better, static constexpr) variables.

But you might avoid some constants altogether. Consider doing:

static const std::string separator = "__  ";
static const std::size_t subStringSize = 2;
...
auto subString = originalString.substr(location + separator.size(), subStringSize);

Precompute the replacement strings

The code that replaces the patterns in the original string has several responsibilities: it needs to find the patterns, calculate their replacements, and then substitute the original patterns for their replacements. But it looks like the replacements could be precomputed, so that the function that does the replacing has less responsibilities, and will therefore be simpler and have less chance of bugs. You could do the precomputation in a separate function:

std::map<std::string, std::string> computeReplacements(const std::map<...>& patternMap) {
    static const std::string surrounding = "*";
    static const std::string separator = "__";

    std::map<std::string, std::string> result;

    for (const auto& pattern: patternMap) {
         std::string replacement = surrounding;
         for (const auto& item: pattern.second) {
               replacement.append(item + separator);
         }
         replacement.erase(replacement.rfind(separator));
         replacement.append(surrounding);
         result[pattern.first] = replacement;
    }

    return result;
}

And then you can do something like:

auto replacements = computeReplacements(patternMap);
auto result = replacePatterns(originalString, replacements);

Another advantage of this is that it can be a lot more efficient if you have to replace a lot of things in the original strings.

Make sure you handle corner cases

There are some subtle bugs in your code as well as in my code above. Consider the case where you have a pattern with an empty list of replacements, like {"CC", {}}. Both your and my code assume that the vector of strings has at least one element, because it tries to replace the last "__" added with "*". But there was never added any "__", this will fail, potentially writing out of bounds of the string holding the replacement. Either you should handle this case, or if you explicitly want to forbid this, it might be very helpful to add an assert() statement to the code, like:

assert(!patterns.empty());

Another potential issue is if the original string ends with "__". In your code, this will cause teh call to originalString.substr() to read out of bounds.

Finally, consider whether you still need to do ++location if you found a match and already did location += alternateSubString.size().

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.