7

I am trying to split a string and put it into a vector

however, I also want to keep an empty token whenever there are consecutive delimiter:

For example:

string mystring = "::aa;;bb;cc;;c"

I would like to tokenize this string on :; delimiters but in between delimiters such as :: and ;; I would like to push in my vector an empty string;

so my desired output for this string is:

"" (empty)
aa
"" (empty)
bb
cc
"" (empty)
c

Also my requirement is not to use the boost library.

if any could lend me an idea.

thanks

code that tokenize a string but does not include the empty tokens

void Tokenize(const string& str,vector<string>& tokens, const string& delim)
{
       // Skip delimiters at beginning.
     string::size_type lastPos = str.find_first_not_of(delimiters, 0);
     // Find first "non-delimiter".
     string::size_type pos     = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos)
 {
    // Found a token, add it to the vector.
    tokens.push_back(str.substr(lastPos, pos - lastPos));
    // Skip delimiters.  Note the "not_of"
    lastPos = str.find_first_not_of(delimiters, pos);
    // Find next "non-delimiter"
    pos = str.find_first_of(delimiters, lastPos);
  }
}
5
  • 1
    Have you tried anything?
    – Amit
    Commented Jun 12, 2015 at 7:40
  • I tried the code above to tokenize my string an that works but only it exclude empty tokens Commented Jun 12, 2015 at 7:47
  • Why don't you add a tokens.push_back(""); just after tokens.push_back(str.substr(lastPos, pos - lastPos)); ?
    – Bastien
    Commented Jun 12, 2015 at 7:48
  • I guess its not possible, what if is a different string? Commented Jun 12, 2015 at 7:51
  • Try replacing find_first_not_of with something else (perhaps a simple increment by 1). Commented Jun 12, 2015 at 7:55

2 Answers 2

5

You can make your algorithm work with some simple changes. First, don't skip delimiters at the beginning, then instead of skipping delimiters in the middle of the string, just increment the position by one. Also, your npos check should ensure that both positions are not npos so it should be && instead of ||.

void Tokenize(const string& str,vector<string>& tokens, const string& delimiters)
{
    // Start at the beginning
    string::size_type lastPos = 0;
    // Find position of the first delimiter
    string::size_type pos = str.find_first_of(delimiters, lastPos);

    // While we still have string to read
    while (string::npos != pos && string::npos != lastPos)
    {
        // Found a token, add it to the vector
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Look at the next token instead of skipping delimiters
        lastPos = pos+1;
        // Find the position of the next delimiter
        pos = str.find_first_of(delimiters, lastPos);
    }

    // Push the last token
    tokens.push_back(str.substr(lastPos, pos - lastPos));
}
5
  • "// Find next "non-delimiter" does not describe what pos = str.find_first_of(delimiters, lastPos); does, and you don't add the token after the last delimiter (or for a str without any delimiter). Generate approach is sound though. Commented Jun 12, 2015 at 8:09
  • Should'nt you add tokens.push_back(str.substr(lastPos, pos - lastPos)); at the end to add the last string if not empty ?
    – Bastien
    Commented Jun 12, 2015 at 8:09
  • wow.. amazing.. it work.. I suspect that || is the cause why I am having infinite loop... I just have 1 question does this work as well when the last char of a string is a delimiter? example if is "::aa;;bb;cc;;c:" this last token should be "" . Commented Jun 12, 2015 at 8:15
  • @XDProgrammer yes, if the final character is a delimiter, there will be an empty string at the end of the vector. Commented Jun 12, 2015 at 8:19
  • @TartanLIama thanks man, I havent seen that last push_back you added before I post.. it work everything as expected.. Anyways, I need to study this code.. as i dont fully understand yet whats going on.. all similar topics I've found were all using the boost library. it does looks easy when you use that library but I will not understand if I just use that.. again thanks Commented Jun 12, 2015 at 8:24
2

I have a version using iterators:

std::vector<std::string> split_from(const std::string& s
    , const std::string& d, unsigned r = 20)
{
    std::vector<std::string> v;
    v.reserve(r);

    auto pos = s.begin();
    auto end = pos;

    while(end != s.end())
    {
        end = std::find_first_of(pos, s.end(), d.begin(), d.end());
        v.emplace_back(pos, end);
        pos = end + 1;
    }

    return v;
}

Using your interface:

void Tokenize(const std::string& s, std::vector<std::string>& tokens
    , const std::string& delims)
{
    auto pos = s.begin();
    auto end = pos;

    while(end != s.end())
    {
        end = std::find_first_of(pos, s.end(), delims.begin(), delims.end());
        tokens.emplace_back(pos, end);
        pos = end + 1;
    }
}
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.